Basic Practices in Assembly Language Programming

Contents

 

Introduction

Assembly language is a low-level programming language for niche platforms such as IoTs, device drivers, and embedded systems. Usually, it’s the sort of language that Computer Science students should cover in their coursework and rarely use in their future jobs. From TIOBE Programming Community Index, assembly language has enjoyed a steady rise in the rankings of the most popular programming languages recently.

In the early days, when an application was written in assembly language, it had to fit in a small amount of memory and run as efficiently as possible on slow processors. When memory becomes plentiful and processor speed is dramatically increased, we mainly rely on high level languages with ready made structures and libraries in development. If necessary, assembly language can be used to optimize critical sections for speed or to directly access non-portable hardware. Today assembly language still plays an important role in embedded system design, where performance efficiency is still considered as an important requirement.

In this article, we’ll talk about some basic criteria and code skills specific to assembly language programming. Also, considerations would be emphasized on execution speed and memory consumption. I’ll analyze some examples, related to the concepts of register, memory, and stack, operators and constants, loops and procedures, system calls, etc.. For simplicity, all samples are in 32-bit, but most ideas will be easily applied to 64-bit.

All the materials presented here came from my teaching [1] for years. Thus, to read this article, a general understanding of Intel x86-64 assembly language is necessary, and being familiar with Visual Studio 2010 or above is assumed. Preferred, having read Kip Irvine’s textbook [2] and the MASM Programmer’s Guide [3] are recommended. If you are taking an Assembly Language Programming class, this could be a supplemental reading for studies.

About instruction

The first two rules are general. If you can use less, don’t use more.

1. Using less instructions

Suppose that we have a 32-bit DWORD variable:

.data
   var1 DWORD 123

The example is to add var1 to EAX. This is correct with MOV and ADD:

mov ebx, var1
add eax, ebx

But as ADD can accept one memory operand, you can just

add eax, var1

2. Using an instruction with less bytes

Suppose that we have an array:

.data
   array DWORD 1,2,3

If want to rearrange the values to be 3,1,2, you could

mov eax,array           ;        eax =1
xchg eax,[array+4]      ; 1,1,3, eax =2
xchg eax,[array+8]      ; 1,1,2, eax =3
xchg array,eax          ; 3,1,2, eax =1

But notice that the last instruction should be MOV instead of XCHG. Although both can assign 3 in EAX to the first array element, the other way around in exchange XCHG is logically unnecessary.

Be aware of code size, MOV takes 5-byte machine code but XCHG takes 6, as another reason to choose MOV here:

00000011  87 05 00000000 R      xchg array,eax
00000017  A3 00000000 R         mov array,eax

To check machine code, you can generate a listing file in assembling or open the Disassembly window at runtime in Visual Studio. Also, you can look up from the Intel instruction manual.

About register and memory

In this section, we’ll use a popular example, the nth Fibonacci number, to illustrate multiple solutions in assembly language. The C function would be like:

unsigned int Fibonacci(unsigned int n)
{
    unsigned int previous = 1, current = 1, next = 0;
    for (unsigned int i = 3; i <= n; ++i) 
    {
        next = current + previous;
        previous = current;
        current = next;
    }
    return next;
}

3. Implementing with memory variables

At first, let’s copy the same idea from above with two variables previous and current created here

.data
   previous DWORD ?
   current  DWORD ?

We can use EAX store the result without the next variable. Since MOV cannot move from memory to memory, a register like EDX must be involved for assignment previous = current. The following is the procedure FibonacciByMemory. It receives n from ECX and returns EAX as the nth Fibonacci number calculated:

;------------------------------------------------------------
FibonacciByMemory PROC 
; Receives: ECX as input n 
; Returns: EAX as nth Fibonacci number calculated
;------------------------------------------------------------
   mov   eax,1         
   mov   previous,0         
   mov   current,0         
L1:
   add eax,previous       ; eax = current + previous      
   mov edx, current       ; previous = current
   mov previous, edx
   mov current, eax
loop   L1
   ret
FibonacciByMemory ENDP

4. If you can use registers, don’t use memory

A basic rule in assembly language programming is that if you can use a register, don’t use a variable. The register operation is much faster than that of memory. The general purpose registers available in 32-bit are EAX, EBX, ECX, EDX, ESI, and EDI. Don’t touch ESP and EBP that are for system use.

Now let EBX replace the previous variable and EDX replace current. The following is FibonacciByRegMOV, simply with three instructions needed in the loop:

;------------------------------------------------------------
FibonacciByRegMOV PROC 
; Receives: ECX as input n 
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------
   mov   eax,1         
   xor   ebx,ebx      
   xor   edx,edx      
L1:
   add  eax,ebx      ; eax += ebx
   mov  ebx,edx
   mov  edx,eax
loop   L1
   ret
FibonacciByRegMOV ENDP

A further simplified version is to make use of XCHG which steps up the sequence without need of EDX. The following shows FibonacciByRegXCHG machine code in its listing, where only two instructions of three machine-code bytes in the loop body:

           ;------------------------------------------------------------
000000DF    FibonacciByRegXCHG PROC
           ; Receives: ECX as input n
           ; Returns: EAX, nth Fibonacci number
           ;------------------------------------------------------------
000000DF  33 C0         xor   eax,eax
000000E1  BB 00000001   mov   ebx,1
000000E6             L1:
000000E6  93            xchg eax,ebx      ; step up the sequence
000000E7  03 C3         add  eax,ebx      ; eax += ebx
000000E9  E2 FB      loop   L1
000000EB  C3            ret
000000EC    FibonacciByRegXCHG ENDP

In concurrent programming

The x86-64 instruction set provides many atomic instructions with the ability to temporarily inhibit interrupts, ensuring that the currently running process cannot be context switched, and suffices on a uniprocessor. In someway, it also would avoid the race condition in multi-tasking. These instructions can be directly used by compiler and operating system writers.

5. Using atomic instructions

As seen above used XCHG, so called as atomic swap, is more powerful than some high level language with just one statement:

xchg  eax, var1

A classical way to swap a register with a memory var1 could be

mov ebx, eax
mov eax, var1
mov var1, ebx

Moreover, if you use the Intel486 instruction set with the .486 directive or above, simply using the atomic XADD is more concise in the Fibonacci procedure. XADD exchanges the first operand (destination) with the second operand (source), then loads the sum of the two values into the destination operand. Thus we have

           ;------------------------------------------------------------
000000EC    FibonacciByRegXADD PROC
           ; Receives: ECX as input n
           ; Returns: EAX, nth Fibonacci number
           ;------------------------------------------------------------
000000EC  33 C0         xor   eax,eax
000000EE  BB 00000001   mov   ebx,1
000000F3             L1:
000000F3  0F C1 D8      xadd eax,ebx   ; first exchange and then add
000000F6  E2 FB      loop   L1
000000F8  C3            ret
000000F9    FibonacciByRegXADD ENDP

Two atomic move extensions are MOVZX and MOVSX. Another worth mentioning is bit test instructions, BT, BTC, BTR, and BTS. For the following example

.data
  Semaphore WORD 10001000b
.code
  btc Semaphore, 6  ; CF=0, Semaphore WORD 11001000b

Imagine the instruction set without BTC, one non-atomic implementation for the same logic would be

mov ax, Semaphore
shr ax, 7
xor Semaphore,01000000b

Little-endian

An x86 processor stores and retrieves data from memory using little-endian order (low to high). The least significant byte is stored at the first memory address allocated for the data. The remaining bytes are stored in the next consecutive memory positions.

6. Memory representations

Consider the following data definitions:

.data
dw1 DWORD 12345678h
dw2 DWORD 'AB', '123', 123h
;dw3 DWORD 'ABCDE'  ; error A2084: constant value too large
by3 BYTE 'ABCDE', 0FFh, 'A', 0Dh, 0Ah, 0
w1 WORD 123h, 'AB', 'A'

For simplicity, the hexadecimal constants are used as initializer. The memory representation is as follows:

As for multiple-byte DWORD and WORD date, they are represented by the little-endian order. Based on this, the second DWORD initialized with 'AB' should be 00004142h and next '123' is 00313233h in their original order. You can’t initialize dw3 as 'ABCDE' that contains five bytes 4142434445h, while you really can initialize by3 in a byte memory since no little-endian for byte data. Similarly, see w1 for a WORD memory.

7. A code error hidden by little-endian

From the last section of using XADD, we try to fill in a byte array with first 7 Fibonacci numbers, as 01, 01, 02, 03, 05, 08, 0D. The following is such a simple implementation but with a bug. The bug does not show up an error immediately because it has been hidden by little-endian.

FibCount = 7
.data
FibArray BYTE FibCount DUP(0ffh)
BYTE 'ABCDEF' 

.code
   mov  edi, OFFSET FibArray       
   mov  eax,1             
   xor  ebx,ebx          
   mov  ecx, FibCount        
 L1:
   mov  [edi], eax                
   xadd eax, ebx                      
   inc  edi                  
 loop L1

To debug, I purposely make a memory 'ABCDEF' at the end of the byte array FibArray with seven 0ffhinitialized. The initial memory looks like this:

Let’s set a breakpoint in the loop. When the first number 01 filled, it is followed by three zeros as this:

But OK, the second number 01 comes to fill the second byte to overwrite three zeros left by the first. So on and so forth, until the seventh 0D, it just fits the last byte here:

All fine with an expected result in FibArray because of little-endian. Only when you define some memory immediately after this FibArray, your first three byte will be overwritten by zeros, as here 'ABCDEF' becomes 'DEF'. How to make an easy fix?

About runtime stack

The runtime stack is a memory array directly managed by the CPU, with the stack pointer register ESP holding a 32-bit offset on the stack. ESP is modified by instructions CALL, RET, PUSH, POP, etc.. When use PUSH and POP or alike, you explicitly change the stack contents. You should be very cautious without affecting other implicit use, like CALL and RET, because you programmer and the system share the same runtime stack.

8. Assignment with PUSH and POP is not efficient

In assembly code, you definitely can make use of the stack to do assignment previous = current, as in FibonacciByMemory. The following is FibonacciByStack where only difference is using PUSH and POPinstead of two MOV instructions with EDX.

;------------------------------------------------------------
FibonacciByStack 
; Receives: ECX as input n 
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------
   mov   eax,1         
   mov   previous,0         
   mov   current,0         
L1:
   add  eax,previous      ; eax = current + previous     
   push current           ; previous = current
   pop  previous
   mov  current, eax
loop   L1
   ret
FibonacciByStack ENDP

As you can imagine, the runtime stack built on memory is much slower than registers. If you create a test benchmark to compare above procedures in a long loop, you’ll find that FibonacciByStack is the most inefficient. My suggestion is that if you can use a register or memory, don’t use PUSH and POP.

9. Using INC to avoid PUSHFD and POPFD

When you use the instruction ADC or SBB to add or subtract an integer with the previous carry, you reasonably want to reserve the previous carry flag (CF) with PUSHFD and POPFD, since an address update with ADD will overwrite the CF. The following Extended_Add example borrowed from the textbook [2] is to calculate the sum of two extended long integers BYTE by BYTE:

;--------------------------------------------------------
Extended_Add PROC
; Receives: ESI and EDI point to the two long integers
;           EBX points to an address that will hold sum
;           ECX indicates the number of BYTEs to be added
; Returns:  EBX points to an address of the result sum
;--------------------------------------------------------
   clc                      ; clear the Carry flag
   L1:
      mov   al,[esi]        ; get the first integer
      adc   al,[edi]        ; add the second integer
      pushfd                ; save the Carry flag

      mov   [ebx],al        ; store partial sum
      add   esi, 1          ; point to next byte   
      add   edi, 1
      add   ebx, 1          ; point to next sum byte   
      popfd                 ; restore the Carry flag
   loop   L1                ; repeat the loop

   mov   dword ptr [ebx],0  ; clear high dword of sum
   adc   dword ptr [ebx],0  ; add any leftover carry
   ret
Extended_Add ENDP

As we know, the INC instruction makes an increment by 1 without affecting the CF. Obviously we can replace above ADD with INC to avoid PUSHFD and POPFD. Thus the loop is simplified like this:

L1:
   mov   al,[esi]        ; get the first integer
   adc   al,[edi]        ; add the second integer

   mov   [ebx],al        ; store partial sum
   inc   esi             ; add one without affecting CF
   inc   edi
   inc   ebx
loop   L1                ; repeat the loop

Now you might ask what if to calculate the sum of two long integers DWORD by DWORD where each iteration must update the addresses by 4 bytes, as TYPE DWORD. We still can make use of INC to have such an implementation:

clc
xor   ebx, ebx

L1:
    mov eax, [esi +ebx*TYPE DWORD]
    adc eax, [edi +ebx*TYPE DWORD]
    mov [edx +ebx*TYPE DWORD], eax
    inc ebx
loop  L1

Applying a scaling factor here would be more general and preferred. Similarly, wherever necessary, you also can use the DEC instruction that makes a decrement by 1 without affecting the carry flag.

10. Another good reason to avoid PUSH and POP

Since you and the system share the same stack, you should be very careful without disturbing the system use. If you forget to make PUSH and POP in pair, an error could happen, especially in a conditional jump when the procedure returns.

The following Search2DAry searches a 2-dimensional array for a value passed in EAX. If it is found, simply jump to the FOUND label returning one in EAX as true, else set EAX zero as false.

;------------------------------------------------------------
Search2DAry PROC
; Receives: EAX, a byte value to search a 2-dimensional array
;           ESI, an address to the 2-dimensional array
; Returns: EAX, 1 if found, 0 if not found
;------------------------------------------------------------
   mov  ecx,NUM_ROW        ; outer loop count

ROW:   
   push ecx                ; save outer loop counter
   mov  ecx,NUM_COL        ; inner loop counter

   COL:   
      cmp al, [esi+ecx-1]
      je FOUND   
   loop COL

   add esi, NUM_COL
   pop  ecx                ; restore outer loop counter
loop ROW                   ; repeat outer loop

   mov eax, 0
   jmp QUIT
FOUND: 
   mov eax, 1
QUIT:
   ret
Search2DAry ENDP

Let’s call it in main by preparing the argument ESI pointing to the array address and the search value EAX to be 31h or 30h respectively for not-found or found test case:

.data
ary2D   BYTE  10h,  20h,  30h,  40h,  50h
        BYTE  60h,  70h,  80h,  90h,  0A0h
NUM_COL = 5
NUM_ROW = 2

.code
main PROC
   mov esi, OFFSET ary2D
   mov eax, 31h            ; crash if set 30h 
   call Search2DAry
; See eax for search result
   exit
main ENDP

Unfortunately, it’s only working in not-found for 31h. A crash occurs for a successful searching like 30h, because of the stack leftover from an outer loop counter pushed. Sadly enough, that leftover being popped by RETbecomes a return address to the caller.

Therefore, it’s better to use a register or variable to save the outer loop counter here. Although the logic error is still, a crash would not happen without interfering with the system. As a good exercise, you can try to fix.

Assembling time vs. runtime

I would like to talk more about this assembly language feature. Preferred, if you can do something at assembling time, don’t do it at runtime. Organizing logic in assembling indicates doing a job at static (compilation) time, not consuming runtime. Differently from high level languages, all operators in assembly language are processed in assembling such as +, -, *, and /, while only instructions work at runtime like ADD, SUB, MUL, and DIV.

11. Implementing with plus (+) instead of ADD

Let’s redo Fibonacci calculating to implement eax = ebx + edx in assembling with the plus operator by help of the LEA instruction. The following is FibonacciByRegLEA with only one line changed from FibonacciByRegMOV.

;------------------------------------------------------------
FibonacciByRegLEA 
; Receives: ECX as input n 
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------
   xor   eax,eax         
   xor   ebx,ebx      
   mov   edx,1      
L1:
   lea  eax, DWORD PTR [ebx+edx]  ; eax = ebx + edx
   mov  edx,ebx
   mov  ebx,eax
loop   L1

   ret
FibonacciByRegLEA ENDP

This statement is encoded as three bytes implemented in machine code without an addition operation explicitly at runtime:

000000CE  8D 04 1A      lea eax, DWORD PTR [ebx+edx]  ; eax = ebx + edx

This example doesn’t make too much performance difference, compared to FibonacciByRegMOV. But is enough as an implementation demo.

12. If you can use an operator, don’t use an instruction

For an array defined as:

.data
   Ary1 DWORD 20 DUP(?)

If you want to traverse it from the second element to the middle one, you might think of this like in other language:

mov esi, OFFSET Ary1
add esi, TYPE DWORD    ; start at the second value 
mov ecx LENGTHOF Ary1  ; total number of values
sub ecx, 1
div ecx, 2             ; set loop counter in half
L1:
   ; do traversing
Loop L1

Remember that ADD, SUB, and DIV are dynamic behavior at runtime. If you know values in advance, they are unnecessary to calculate at runtime, instead, apply operators in assembling:

mov esi, OFFSET Ary1 + TYPE DWORD   ; start at the second
mov ecx (LENGTHOF Ary1 -1)/2        ; set loop counter
L1:
   ; do traversing
Loop L1

This saves three instructions in the code segment at runtime. Next, let’s save memory in the data segment.

13. If you can use a symbolic constant, don’t use a variable

Like operators, all directives are processed at assembling time. A variable consumes memory and has to be accessed at runtime. As for the last Ary1, you may want to remember its size in byte and the number of elements like this:

.data
   Ary1 DWORD 20 DUP(?)
   arySizeInByte DWORD ($ - Ary1)  ; 80
   aryLength DWORD LENGTHOF Ary1   ; 20

It is correct but not preferred because of using two variables. Why not simply make them symbolic constants to save the memory of two DWORD?

.data
   Ary1 DWORD 20 DUP(?)
   arySizeInByte = ($ - Ary1)      ; 80
   aryLength EQU LENGTHOF Ary1     ; 20

Using either equal sign or EQU directive is fine. The constant is just a replacement during code preprocessing.

14. Generating the memory block in macro

For an amount of data to initialize, if you already know the logic how to create, you can use macro to generate memory blocks in assembling, instead of at runtime. The following macro creates all 47 Fibonacci numbers in a DWORD array named FibArray:

.data
val1 = 1
val2 = 1
val3 = val1 + val2 

FibArray LABEL DWORD
DWORD val1                ; first two values
DWORD val2
WHILE val3 LT 0FFFFFFFFh  ; less than 4-billion, 32-bit
   DWORD val3             ; generate unnamed memory data
   val1 = val2
   val2 = val3
   val3 = val1 + val2
ENDM

As macro goes to the assembler to be processed statically, this saves considerable initializations at runtime, as opposed to FibonacciByXXX mentioned before.

For more about macro in MASM, see my article Something You May Not Know About the Macro in MASM [4]. I also made a reverse engineering for the switch statement in VC++ compiler implementation. Interestingly, under some condition the switch statement chooses the binary search but without exposing the prerequisite of a sort implementation at runtime. It’s reasonable to think of the preprocessor that does the sorting with all known case values in compilation. The static sorting behavior (as opposed to dynamic behavior at runtime), could be implemented with a macro procedure, directives and operators. For details, please see Something You May Not Know About the Switch Statement in C/C++ [5].

About loop design

Almost every language provides an unconditional jump like GOTO, but most of us rarely use it based on software engineering principles. Instead, we use others like break and continue. While in assembly language, we rely more on jumps either conditional or unconditional to make control workflow more freely. In the following sections, I list some ill-coded patterns.

15. Encapsulating all loop logic in the loop body

To construct a loop, try to make all your loop contents in the loop body. Don’t jump out to do something and then jump back into the loop. The example here is to traverse a one-dimensional integer array. If find an odd number, increment it, else do nothing.

Two unclear solutions with the correct result would be possibly like:

   mov ecx, LENGTHOF array
   xor esi, esi
L1: 
   test array[esi], 1
   jnz ODD
PASS:
   add esi, TYPE DWORD
loop L1
   jmp DONE

ODD: 
  inc array[esi]
jmp PASS
DONE:
   mov ecx, LENGTHOF array
   xor esi, esi
   jmp L1

ODD: 
  inc array[esi]
jmp PASS

L1: 
   test array[esi], 1
   jnz ODD
PASS:
   add esi, TYPE DWORD
loop L1

However, they both do incrementing outside and then jump back. They make a check in the loop but the left does incrementing after the loop and the right does before the loop. For a simple logic, you may not think like this; while for a complicated problem, assembly language could lead astray to produce such a spaghetti pattern. The following is a good one, which encapsulates all logic in the loop body, concise, readable, maintainable, and efficient.

   mov ecx, LENGTHOF array
   xor esi, esi
L1: 
   test array[esi], 1
   jz PASS
   inc array[esi]
PASS:
   add esi, TYPE DWORD
loop L1

16. Loop entrance and exit

Usually preferred is a loop with one entrance and one exit. But if necessary, two or more conditional exits are fine as shown in Search2DAry with found and not-found results.

The following is a bad pattern of two-entrance, where one gets into START via initialization and another directly goes to MIDDLE. Such a code is pretty hard to understand. Need to reorganize or refactor the loop logic.

   ; do something
   je MIDDLE

   ; loop initialization
START: 
   ; do something

MIDDLE:
   ; do something
loop START

The following is a bad pattern of two-loop ends, where some logic gets out of the first loop end while the other exits at the second. Such a code is quite confusing. Try to reconsider with a label jumping to maintain one loop end.

   ; loop initialization
START2: 
   ; do something
   je NEXT
   ; do something
loop START2
   jmp DONE

NEXT:
   ; do something
loop START2
DONE:

17. Don’t change ECX in the loop body

The register ECX acts as a loop counter and its value is implicitly decremented when using the LOOP instruction. You can read ECX and make use of its value in iteration. As see in Search2DAry in the previous section, we compare the indirect operand [ESI+ECX-1] with AL. But never try to change the loop counter within the loop body that makes code hard to understand and hard to debug. A good practice is to think of the loop counter ECX as read-only.

   ; do initialization
   mov ecx, 10
L1: 
   ; do something
   mov eax, ecx                      ; fine
   mov ebx, [esi +ecx *TYPE DWORD]   ; fine
   mov ecx, edx                      ; not good 
   inc ecx                           ; not good
   ; do something
loop L1

18. When jump backward…

Besides the LOOP instruction, assembly language programming can heavily rely on conditional or unconditional jumps to create a loop when the count is not determined before the loop. Theoretically, for a backward jump, the workflow might be considered as a loop. Assume that jx and jy are desired jump or LOOP instructions. The following backward jy L2 nested in the jx L1 is probably thought of as an inner loop.

; loop initialization 
L1: 
   ; do something
 L2: 
   ; do something
 jy L2
   ; do something
jx L1

To have selection logic of if-then-else, it’s reasonable to use a foreword jump like this as branching in the jx L1iteration:

; loop initialization 
L1: 
   ; do something
 jy TrueLogic
   ; do something for false
   jmp DONE
 TrueLogic:
   ; do something for true
DONE:
   ; do something
jx L1

About procedure

Similar to functions in C/C++, we talk about some basics in assembly language’s procedure.

19. Making a clear calling interface

When design a procedure, we hope to make it as reusable as possible. Make it perform only one task without others like I/O. The procedure’s caller should take the responsibility to do input and putout. The caller should communicate with the procedure only by arguments and parameters. The procedure should only use parameters in its logic without referring outside definitions, without any:

  • Global variable and array
  • Global symbolic constant

Because implementing with such a definition makes your procedure un-reusable.

Recalling previous five FibonacciByXXX procedures, we use register ECX as both argument and parameter with the return value in EAX to make a clear calling interface:

;------------------------------------------------------------
FibonacciByXXX 
; Receives: ECX as input n 
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------

Now the caller can do like

; Read user’s input n and save in ECX
call FibonacciByXXX
; Output or process the nth Fibonacci number in EAX

To illustrate as a second example, let’s take a look again at calling Search2DAry in the previous section. The register arguments ESI and EAX are prepared so that the implementation of Search2DAry doesn’t directly refer to the global array, ary2D.

... ...
NUM_COL = 5
NUM_ROW = 2

.code
main PROC
   mov esi, OFFSET ary2D
   mov eax, 31h 
   call Search2DAry
; See eax for search result
   exit
main ENDP

;------------------------------------------------------------
Search2DAry PROC
; Receives: EAX, a byte value to search a 2-dimensional array
;           ESI, an address to the 2-dimensional array
; Returns: EAX, 1 if found, 0 if not found
;------------------------------------------------------------
   mov  ecx,NUM_ROW        ; outer loop count
... ...
   mov  ecx,NUM_COL        ; inner loop counter
... ...

Unfortunately, the weakness is its implementation still using two global constants NUM_ROW and NUM_COL that makes it not being called elsewhere. To improve, supplying other two register arguments would be an obvious way, or see the next section.

20. INVOKE vs. CALL

Besides the CALL instruction from Intel, MASM provides the 32-bit INVOKE directive to make a procedure call easier. For the CALL instruction, you only can use registers as argument/parameter pair in calling interface as shown above. The problem is that the number of registers is limited. All registers are global and you probably have to save registers before calling and restore after calling. The INVOKE directive gives the form of a procedure with a parameter-list, as you experienced in high level languages.

When consider Search2DAry with a parameter-list without referring the global constants NUM_ROW and NUM_COL, we can have its prototype like this

;---------------------------------------------------------------------
Search2DAry PROTO, pAry2D: PTR BYTE, val: BYTE, nRow: WORD, nCol: WORD 
; Receives: pAry2D, an address to the 2-dimensional array
;           val, a byte value to search a 2-dimensional array 
;           nRow, the number of rows 
;           nCol, the number of columns
; Returns: EAX, 1 if found, 0 if not found
;---------------------------------------------------------------------

Again, as an exercise, you can try to implement this for a fix. Now you just do

INVOKE Search2DAry, ary2D, 31h, NUM_ROW, NUM_COL
; See eax for search result

Likewise, to construct a parameter-list procedure, you still need to follow the rule without referring global variables and constants. Besides, also attention to:

  • The entire calling interface should only go through the parameter list without referring any register values set outside the procedure.

21. Call-by-Value vs. Call-by-Reference

Also be aware of that a parameter-list should not be too long. If so, use an object parameter instead. Suppose that you fully understood the function concept, call-by-value and call-by-reference in high level languages. By learning the stack frame in assembly language, you understand more about the low-level function calling mechanism. Usually for an object argument, we prefer passing a reference, an object address, rather than the whole object copied on the stack memory.

To demonstrate this, let’s create a procedure to write month, day, and year from an object of the Win32 SYSTEMTIME structure.

The following is the version of call-by-value, where we use the dot operator to retrieve individual WORD field members from the DateTime object and extend their 16-bit values to 32-bit EAX:

;--------------------------------------------------------
WriteDateByVal PROC, DateTime:SYSTEMTIME
; Receives: DateTime, an object of SYSTEMTIME
;--------------------------------------------------------
   movzx eax, DateTime.wMonth
   ; output eax as month
   ; output a separator like '/' 
   movzx eax, DateTime.wDay
   ; output eax as day
   ; output a separator like '/' 
   movzx eax, DateTime.wYear
   ; output eax as year
   ; make a newline
   ret
WriteDateByVal ENDP

The version of call-by-reference is not so straight with an object address received. Not like the arrow ->, pointer operator in C/C++, we have to save the pointer (address) value in a 32-bit register like ESI. By using ESI as an indirect operand, we must cast its memory back to the SYSTEMTIME type. Then we can get the object members with the dot:

;--------------------------------------------------------
WriteDateByRef PROC, datetimePtr: PTR SYSTEMTIME
; Receives: DateTime, an address of SYSTEMTIME object
;--------------------------------------------------------
   mov esi, datetimePtr
   movzx eax, (SYSTEMTIME PTR [esi]).wMonth
   ; output eax as month
   ; output a separator like '/'
   movzx eax, (SYSTEMTIME PTR [esi]).wDay
   ; output eax as day
   ; output a separator like '/' 
   movzx eax, (SYSTEMTIME PTR [esi]).wYear
   ; output eax as year
   ; make a newline
   ret
WriteDateByRef ENDP

You can watch the stack frame of argument passed for two versions at runtime. For WriteDateByVal, eight WORD members are copied on the stack and consume sixteen bytes, while for WriteDateByRef, only need four bytes as a 32-bit address. It will make a big difference for a big structure object, though.

22. Avoid multiple RET

To construct a procedure, it’s ideal to make all your logics within the procedure body. Preferred is a procedure with one entrance and one exit. Since in assembly language programming, a procedure name is directly represented by a memory address, as well as any labels. Thus directly jumping to a label or a procedure without using CALL or INVOKE would be possible. Since such an abnormal entry would be quite rare, I am not to going to mention here.

Although multiple returns are sometimes used in other language examples, I don’t encourage such a pattern in assembly code. Multiple RET instructions could make your logic not easy to understand and debug. The following code on the left is such an example in branching. Instead, on the right, we have a label QUIT at the end and jump there making a single exit, where probably do common chaos to avoid repeated code.

MultiRetEx PROC
   ; do something 
   jx NEXTx
   ; do something
   ret

NEXTx: 
   ; do something
   jy NEXTy
   ; do something
   ret

NEXTy: 
   ; do something
   ret
MultiRetEx ENDP
SingleRetEx PROC
   ; do something 
   jx NEXTx
   ; do something
   jmp QUIT
NEXTx: 
   ; do something
   jy NEXTy
   ; do something
   jmp QUIT
NEXTy: 
   ; do something
QUIT:
   ; do common things
   ret
SingleRetEx ENDP

Object data members

Similar to above SYSTEMTIME structure, we can also create our own type or a nested:

Rectangle STRUCT
   UpperLeft COORD <>
   LowerRight COORD <>
Rectangle ENDS

.data
rect Rectangle { {10,20}, {30,50} }

The Rectangle type contains two COORD members, UpperLeft and LowerRight. The Win32 COORD contains two WORD (SHORT), X and Y. Obviously, we can access the object rect’s data members with the dot operator from either direct or indirect operand like this

; directly access
mov rect.UpperLeft.X, 11

; cast indirect operand to access
mov esi,OFFSET rect
mov (Rectangle PTR [esi]).UpperLeft.Y, 22

; use the OFFSET operator for embedded members
mov esi,OFFSET rect.LowerRight
mov (COORD PTR [esi]).X, 33
mov esi,OFFSET rect.LowerRight.Y
mov WORD PTR [esi], 55

By using the OFFSET operator, we access different data member values with different type casts. Recall that any operator is processed in assembling at static time. What if we want to retrieve a data member’s address (not value) at runtime?

23. Indirect operand and LEA

For an indirect operand pointing to an object, you can’t use the OFFSET operator to get the member’s address, because OFFSET only can take an address of a variable defined in the data segment.

There could be a scenario that we have to pass an object reference argument to a procedure like WriteDateByRef in the previous section, but want to retrieve its member’s address (not value). Still use the above rect object for an example. The following second use of OFFSET is not valid in assembling:

mov esi,OFFSET rect
mov edi, OFFSET (Rectangle PTR [esi]).LowerRight

Let’s ask for help from the LEA instruction that you have seen in FibonacciByRegLEA in the previous section. The LEA instruction calculates and loads the effective address of a memory operand. Similar to the OFFSEToperator, except that only LEA can obtain an address calculated at runtime:

mov esi,OFFSET rect
lea edi, (Rectangle PTR [esi]).LowerRight
mov ebx, OFFSET rect.LowerRight

lea edi, (Rectangle PTR [esi]).UpperLeft.Y
mov ebx, OFFSET rect.UpperLeft.Y

mov esi,OFFSET rect.UpperLeft
lea edi, (COORD PTR [esi]).Y

I purposely have EBX here to get an address statically and you can verify the same address in EDI that is loaded dynamically from the indirect operand ESI at runtime.

About system I/O

From Computer Memory Basics, we know that I/O operations from the operating system are quite slow. Input and output are usually in the measurement of milliseconds, compared with register and memory in nanoseconds or microseconds. To be more efficient, trying to reduce system API calls is a nice consideration. Here I mean Win32 API call. For details about the Win32 functions mentioned in the following, please refer to MSDN to understand.

24. Reducing system I/O API calls

An example is to output 20 lines of 50 random characters with random colors as below:

We definitely can generate one character to output a time, by using SetConsoleTextAttribute and WriteConsole. Simply set its color by

INVOKE SetConsoleTextAttribute, consoleOutHandle, wAttributes

Then write that character by

INVOKE WriteConsole,
   consoleOutHandle,    ; console output handle
   OFFSET buffer,       ; points to string
   1,                   ; string length
   OFFSET bytesWritten, ; returns number of bytes written
   0

When write 50 characters, make a new line. So we can create a nested iteration, the outer loop for 20 rows and the inner loop for 50 columns. As 50 by 20, we call these two console output functions 1000 times.

However, another pair of API functions can be more efficient, by writing 50 characters in a row and setting their colors once a time. They are WriteConsoleOutputAttribute and WriteConsoleOutputCharacter. To make use of them, let’s create two procedures:

;-----------------------------------------------------------------------
ChooseColor PROC
; Selects a color with 50% probability of red, 25% green and 25% yellow
; Receives: nothing
; Returns:  AX = randomly selected color

;-----------------------------------------------------------------------
ChooseCharacter PROC
; Randomly selects an ASCII character, from ASCII code 20h to 07Ah
; Receives: nothing
; Returns:  AL = randomly selected character

We call them in a loop to prepare a WORD array bufColor and a BYTE array bufChar for all 50 characters selected. Now we can write the 50 random characters per line with two calls here:

INVOKE WriteConsoleOutputAttribute, 
      outHandle, 
      ADDR bufColor, 
      MAXCOL, 
      xyPos, 
      ADDR cellsWritten

INVOKE WriteConsoleOutputCharacter, 
      outHandle, 
      ADDR bufChar, 
      MAXCOL, 
      xyPos, 
      ADDR cellsWritten

Besides bufColor and bufChar, we define MAXCOL = 50 and the COORD type xyPos so that xyPos.y is incremented each row in a single loop of 20 rows. Totally we only call these two APIs 20 times.

About PTR operator

MASM provides the operator PTR that is similar to the pointer * used in C/C++. The following is the PTRspecification:

  • type PTR expression
    Forces the expression to be treated as having the specified type.
  • [[ distance ]] PTR type
    Specifies a pointer to type.

This means that two usages are available, such as BYTE PTR or PTR BYTE. Let’s discuss how to use them.

25. Defining a pointer, cast and dereference

The following C/C++ code demonstrates which type of Endian is used in your system, little endian or big endian? As an integer type takes four bytes, it makes a pointer type cast from the array name fourBytes, a charaddress, to an unsigned int address. Then it displays the integer result by dereferencing the unsigned intpointer.

int main()
{
   unsigned char fourBytes[] = { 0x12, 0x34, 0x56, 0x78 };
   // Cast the memory pointed by the array name fourBytes, to unsigned int address
   unsigned int *ptr = (unsigned int *)fourBytes;
   printf("1. Directly Cast: n is %Xh\n", *ptr);
   return 0;
}

As expected in x86 Intel based system, this verifies the little endian by showing 78563412 in hexadecimal. We can do the same thing in assembly language with DWORD PTR, which is just similar to an address casting to 4-byte DWORD, the unsigned int type.

.data
fourBytes BYTE 12h,34h,56h,78h

.code
mov eax, DWORD PTR fourBytes		; EAX = 78563412h

There is no explicit dereference here, since DWORD PTR combines four bytes into a DWORD memory and lets MOVretrieve it as a direct operand to EAX. This could be considered equivalent to the (unsigned int *) cast.

Now let’s do another way by using PTR DWORD. Again, with the same logic above, this time we define a DWORDpointer type first with TYPEDEF:

DWORD_POINTER TYPEDEF PTR DWORD

This could be considered equivalent to defining the pointer type as unsigned int *. Then in the following data segment, the address variable dwPtr takes over the fourBytes memory. Finally in code, EBX holds this address as an indirect operand and makes an explicit dereference here to get its DWORD value to EAX.

.data
fourBytes BYTE 12h,34h,56h,78h
dwPtr DWORD_POINTER fourBytes

.code
mov ebx, dwPtr       ; Get DWORD address		
mov eax, [ebx]       ; Dereference, EAX = 78563412h

To summarize, PTR DWORD indicates a DWORD address type to define(declare) a variable like a pointer type. While DWORD PTR indicates the memory pointed by a DWORD address like a type cast.

26. Using PTR in a procedure

To define a procedure with a parameter list, you might want to use PTR in both ways. The following is such an example to increment each element in a DWORD array:

;---------------------------------------------------------
IncrementArray PROC, pAry:PTR DWORD, count:DWORD
; Receives: pAry  - pointer to a DWORD array
;           count - the array count
; Returns:  pAry, every vlues in pAry incremented
;---------------------------------------------------------
   mov edi,pAry
   mov ecx,count                      

 L1:
   inc DWORD PTR [edi]
   add edi, TYPE DWORD
 loop L1
   ret
IncrementArray ENDP

As the first parameter pAry is a DWORD address, so PTR DWORD is used as a parameter type. In the procedure, when incrementing a value pointed by the indirect operand EDI, you must tell the system what the type(size) of that memory is by using DWORD PTR.

Another example is the earlier mentioned WriteDateByRef, where SYSTEMTIME is a Windows defined structure type.

;--------------------------------------------------------
WriteDateByRef PROC, datetimePtr: PTR SYSTEMTIME
; Receives: DateTime, an address of SYSTEMTIME object
;--------------------------------------------------------
   mov esi, datetimePtr
   movzx eax, (SYSTEMTIME PTR [esi]).wMonth
  ... ...
   ret
WriteDateByRef ENDP

Likewise, we use PTR SYSTEMTIME as the parameter type to define datetimePtr. When ESI receives an address from datetimePtr, it has no knowledge about the memory type just like a void pointer in C/C++. We have to cast it as a SYSTEMTIME memory, so as to retrieve its data members.

Signed and Unsigned

In assembly language programming, you can define an integer variable as either signed as SBYTE, SWORD, and SDWORD, or unsigned as BYTE, WORD, and DWORD. The data ranges, for example of 8-bit, are

  • BYTE: 0 to 255 (00h to FFh), totally 256 numbers
  • SBYTE: half negatives, -128 to -1 (80h to FFh), half positives, 0 to 127 (00h to 7Fh)

Based on the hardware point of view, all CPU instructions operate exactly the same on signed and unsigned integers, because the CPU cannot distinguish between signed and unsigned. For example, when define

.data
   bVal   BYTE   255
   sbVal  SBYTR  -1

Both of them have the 8-bit binary FFh saved in memory or moved to a register. You, as a programmer, are solely responsible for using the correct data type with an instruction and are able to explain a results from the flags affected:

  • The carry flag CF for unsigned integers
  • The overflow flag OF for signed integers

The following are usually several tricks or pitfalls.

27. Comparison with conditional jumps

Let’s check the following code to see which label it jumps:

mov   eax, -1
cmp   eax, 1
ja    L1
jmp   L2

As we know, CMP follows the same logic as SUB while non-destructive to the destination operand. Using JAmeans considering unsigned comparison, where the destination EAX is FFh, i.e. 255, while the source is 1. Certainly 255 is bigger than 1, so that makes it jump to L1. Thus, any unsigned comparisons such as JA, JB, JAE, JNA, etc. can be remembered as A(Above) or B(Below). An unsigned comparison is determined by CF and the zero flag ZF as shown in the following examples:

CMP if Destination Source ZF(ZR) CF(CY)
Destination<Source 1 2 0 1
Destination>Source 2 1 0 0
Destination=Source 1 1 1 0

Now let’s take a look at signed comparison with the following code to see where it jumps:

mov   eax, -1
cmp   eax, 1
jg    L1
jmp   L2

Only difference is JG here instead of JA. Using JG means considering signed comparison, where the destination EAX is FFh, i.e. -1, while the source is 1. Certainly -1 is smaller than 1, so that makes JMP to L2. Likewise, any signed comparisons such as JG, JL, JGE, JNG, etc. can be thought of as G(Greater) or L(Less). A signed comparison is determined by OF and the sign flag SF as shown in the following examples:

CMP if Destination Source SF(PL) OF(OV)
Destination<Source: (SF != OF) -2 127 0 1
-2 1 1 0
Destination>Source: (SF == OF) 127 1 0 0
127 -1 1 1
Destination = Source 1 1 ZF=1

28. When CBW, CWD, or CDQ mistakenly meets DIV…

As we know, the DIV instruction is for unsigned to perform 8-bit, 16-bit, or 32-bit integer division with the dividend AX, DX:AX, or EDX:EAX respectively. As for unsigned, you have to clear the upper half by zeroing AH, DX, or EDX before using DIV. But when perform signed division with IDIV, the sign extension CBW, CWD, and CDQ are provided to extend the upper half before using IDIV.

For a positive integer, if its highest bit (sign bit) is zero, there is no difference to manually clear the upper part of a dividend or mistakenly use a sign extension as shown in the following example:

mov eax,1002h
cdq
mov ebx,10h
div ebx  ; Quotient EAX = 00000100h, Remainder EDX = 2

This is fine because 1000h is a small positive and CDQ makes EDX zero, the same as directly clearing EDX. So if your value is positive and its highest bit is zero, using CDQ and

XOR EDX, EDX

are exactly the same.

However, it doesn’t mean that you can always use CDQ/CWD/CBW with DIV when perform a positive division. For an example of 8-bit, 129/2, expecting quotient 64 and remainder 1. But, if you make this

mov  al, 129
cbw             ; Extend AL to AH as negative AX = FF81h
mov  bl,2
div  bl         ; Unsigned DIV, Quotient should be 7FC0 over size of AL

Try above in debug to see how integer division overflow happens as a result. If really want to make it correct as unsigned DIV, you must:

mov  al, 129
XOR  ah, ah     ; extend AL to AH as positive
mov  bl,2
div  bl         ; Quotient AL = 40h,  Remainder AH = 1

On the other side, if really want to use CBW, it means that you perform a signed division. Then you must use IDIV:

mov  al, 129    ; 81h (-127d)
cbw             ; Extend AL to AH as negative AX = FF81h
mov  bl,2
idiv bl         ; Quotient AL = C1h (-63d), Remainder AH = FFh (-1)

As seen here, 81h in signed byte is decimal -127 so that signed IDIV gives the correct quotient and remainder as above

29. Why 255-1 and 255+(-1) affect CF differently?

To talk about the carry flag CF, let’s take the following two arithmetic calculations:

mov al, 255
sub al, 1      ; AL = FE  CF = 0

mov bl, 255
add bl, -1     ; BL = FE  CF = 1

From a human being’s point of view, they do exactly the same operation, 255 minus 1 with the result 254 (FEh). Likewise, based on the hardware point, for either calculation, the CPU does the same operation by representing -1 as a two’s complement FFh and then add it to 255. Now 255 is FFh and the binary format of -1 is also FFh. This is how it has been calculated:

   1111 1111
+  1111 1111
-------------
   1111 1110

Remember? A CPU operates exactly the same on signed and unsigned because it cannot distinguish them. A programmer should be able to explain the behavior by the flag affected. Since we talk about the CF, it means we consider two calculations as unsigned. The key information is that -1 is FFh and then 255 in decimal. So the logic interpretation of CF is

  • For sub al, 1, it means 255 minus 1 to result in 254, without need of a borrow, so CF = 0
  • For add bl, -1, it seems that 255 plus 255 is resulted in 510, but with a carry 1,0000,0000b (256) out, 254 is a remainder left in byte, so CF = 1

From hardware implementation, CF depends on which instruction used, ADD or SUB. Here MSB (Most Significant Bit) is the highest bit.

  • For ADD instruction, add bl, -1, directly use the carry out of the MSB, so CF = 1
  • For SUB instruction, sub al, 1, must INVERT the carry out of the MSB, so CF = 0

30. How to determine OF?

Now let’s see the overflow flag OF, still with above two arithmetic calculations as this:

mov al, 255
sub al, 1      ; AL = FE  OF = 0

mov bl, 255
add bl, -1     ; BL = FE  OF = 0

Both of them are not overflow, so OF = 0. We can have two ways to determine OF, the logic rule and hardware implementation.

Logic viewpoint: The overflow flag is only set, OF = 1, when

  • Two positive operands are added and their sum is negative
  • Two negative operands are added and their sum is positive

For signed, 255 is -1 (FFh). The flag OF doesn’t care about ADD or SUB. Our two examples just do -1 plus -1 with the result -2. Thus, two negatives are added with the sum still negative, so OF = 0.

Hardware implementation: For non-zero operands,

  • OF = (carry out of the MSB) XOR (carry into the MSB)

As seen our calculation again:

   1111 1111
+  1111 1111
-------------
   1111 1110

The carry out of the MSB is 1 and the carry into the MSB is also 1. Then OF = (1 XOR 1) = 0

To practice more, the following table enumerates different test cases for your understanding:

Ambiguous «LOCAL» directive

As mentioned previously, the PTR operator has two usages such as DWORD PTR and PTR DWORD. But MASM provides another confused directive LOCAL, that is ambiguous depending on the context, where to use with exactly the same reserved word. The following is the specification from MSDN:

        LOCAL localname [[, localname]]…
LOCAL label [[ [count ] ]] [[:type]] [[, label [[ [count] ]] [[type]]]]…

  • In the first directive, within a macro, LOCAL defines labels that are unique to each instance of the macro.
  • In the second directive, within a procedure definition (PROC), LOCAL creates stack-based variables that exist for the duration of the procedure. The label may be a simple variable or an array containing count elements.

This specification is not clear enough to understand. In this section, I’ll expose the essential difference in between and show two example using the LOCAL directive, one in a procedure and the other in a macro. As for your familiarity, both examples calculate the nth Fibonacci number as early FibonacciByMemory. The main point delivered here is:

  • The variables declared by LOCAL in a macro are NOT local to the macro. They are system generated global variables on the data segment to resolve redefinition.
  • The variables created by LOCAL in a procedure are really local variables allocated on the stack frame with the lifecycle only during the procedure.

For the basic concepts and implementations of data segment and stack frame, please take a look at some textbook or MASM manual that could be worthy of several chapters without being talked here.

31. When LOCAL used in a procedure

The following is a procedure with a parameter n to calculate nth Fibonacci number returned in EAX. I let the loop counter ECX take over the parameter n. Please compare it with FibonacciByMemory. The logic is the same with only difference of using the local variables pre and cur here, instead of global variables previous and currentin FibonacciByMemory.

;------------------------------------------------------------
FibonacciByLocalVariable PROC USES ecx edx, n:DWORD 
; Receives: Input n
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------
LOCAL pre, cur :DWORD

   mov   ecx,n
   mov   eax,1         
   mov   pre,0         
   mov   cur,0         
L1:
   add eax, pre      ; eax = current + previous     
   mov edx, cur 
   mov pre, edx
   mov cur, eax
 loop   L1

   ret
FibonacciByLocalVariable ENDP

The following is the code generated from the VS Disassembly window at runtime. As you can see, each line of assembly source is translated into machine code with the parameter n and two local variables created on the stack frame, referenced by EBP:

   231: ;------------------------------------------------------------
   232: FibonacciByLocalVariable PROC USES ecx edx, n:DWORD 
011713F4 55                   push        ebp  
011713F5 8B EC                mov         ebp,esp  
011713F7 83 C4 F8             add         esp,0FFFFFFF8h  
011713FA 51                   push        ecx  
011713FB 52                   push        edx  
   233: ; Receives: Input n
   234: ; Returns: EAX, nth Fibonacci number
   235: ;------------------------------------------------------------
   236: LOCAL pre, cur :DWORD
   237: 
   238:    mov   ecx,n
011713FC 8B 4D 08             mov         ecx,dword ptr [ebp+8]  
   239:    mov   eax,1         
011713FF B8 01 00 00 00       mov         eax,1  
   240:    mov   pre,0         
01171404 C7 45 FC 00 00 00 00 mov         dword ptr [ebp-4],0  
   241:    mov   cur,0         
0117140B C7 45 F8 00 00 00 00 mov         dword ptr [ebp-8],0  
   242: L1:
   243:    add eax,pre      ; eax = current + previous     
01171412 03 45 FC             add         eax,dword ptr [ebp-4]  
   244:    mov EDX, cur 
01171415 8B 55 F8             mov         edx,dword ptr [ebp-8]  
   245:    mov pre, EDX
01171418 89 55 FC             mov         dword ptr [ebp-4],edx  
   246:    mov cur, eax
0117141B 89 45 F8             mov         dword ptr [ebp-8],eax  
   247:    loop   L1
0117141E E2 F2                loop        01171412  
   248: 
   249:    ret
01171420 5A                   pop         edx  
01171421 59                   pop         ecx  
01171422 C9                   leave  
01171423 C2 04 00             ret         4  
   250: FibonacciByLocalVariable ENDP

When FibonacciByLocalVariable running, the stack frame can be seen as below:

Obviously, the parameter n is at EBP+8. This

add esp, 0FFFFFFF8h

just means

sub esp, 08h

moving the stack pointer ESP down eight bytes for two DWORD creation of pre and cur. Finally the LEAVEinstruction implicitly does

mov esp, ebp
pop ebp

that moves EBP back to ESP releasing the local variables pre and cur. And this releases n, at EBP+8, for STD calling convention:

ret 4

32. When LOCAL used in a macro

To have a macro implementation, I almost copy the same code from FibonacciByLocalVariable. Since no USES for a macro, I manually use PUSH/POP for ECX and EDX. Also without a stack frame, I have to create global variables mPre and mCur on the data segment. The mFibonacciByMacro can be like this:

;------------------------------------------------------------
mFibonacciByMacro MACRO n
; Receives: Input n 
; Returns: EAX, nth Fibonacci number
;------------------------------------------------------------
LOCAL mPre, mCur, mL
.data
   mPre DWORD ?
   mCur DWORD ?

.code
   push ecx
   push edx

   mov   ecx,n
   mov   eax,1         
   mov   mPre,0         
   mov   mCur,0         
mL:
   add  eax, mPre      ; eax = current + previous     
   mov  edx, mCur 
   mov  mPre, edx
   mov  mCur, eax
   loop   mL

   pop edx
   pop ecx
ENDM

If you just want to call mFibonacciByMacro once, for example

mFibonacciByMacro 12

You don’t need LOCAL here. Let’s simply comment it out:

; LOCAL mPre, mCur, mL

mFibonacciByMacro accepts the argument 12 and replace n with 12. This works fine with the following listing MASM generated:

              mFibonacciByMacro 12
0000018C           1   .data
0000018C 00000000        1      mPre DWORD ?
00000190 00000000        1      mCur DWORD ?
00000000           1   .code
00000000  51           1      push ecx
00000001  52           1      push edx
00000002  B9 0000000C       1      mov   ecx,12
00000007  B8 00000001       1      mov   eax,1
0000000C  C7 05 0000018C R  1      mov   mPre,0
     00000000
00000016  C7 05 00000190 R  1      mov   mCur,0
     00000000
00000020           1   mL:
00000020  03 05 0000018C R  1      add  eax,mPre      ; eax = current + previous
00000026  8B 15 00000190 R  1      mov edx, mCur
0000002C  89 15 0000018C R  1      mov mPre, edx
00000032  A3 00000190 R     1      mov mCur, eax
00000037  E2 E7        1      loop   mL
00000039  5A           1      pop edx
0000003A  59           1      pop ecx

Nothing changed from the original code with just a substitution of 12. The variables mPre and mCur are visible explicitly. Now let’s call it twice, like

mFibonacciByMacro 12
mFibonacciByMacro 13

This is still fine for the first mFibonacciByMacro 12 but secondly, causes three redefinitions in preprocessing mFibonacciByMacro 13. Not only are data labels, i.e., variables mPre and mCur, but also complained is the code label mL. This is because in assembly code, each label is actually a memory address and the second label of any mPre, mCur, or mL should take another memory, rather than defining an already created one:

               mFibonacciByMacro 12
 0000018C           1   .data
 0000018C 00000000        1      mPre DWORD ?
 00000190 00000000        1      mCur DWORD ?
 00000000           1   .code
 00000000  51           1      push ecx
 00000001  52           1      push edx
 00000002  B9 0000000C       1      mov   ecx,12
 00000007  B8 00000001       1      mov   eax,1         
 0000000C  C7 05 0000018C R  1      mov   mPre,0         
      00000000
 00000016  C7 05 00000190 R  1      mov   mCur,0         
      00000000
 00000020           1   mL:
 00000020  03 05 0000018C R  1      add  eax,mPre      ; eax = current + previous     
 00000026  8B 15 00000190 R  1      mov edx, mCur 
 0000002C  89 15 0000018C R  1      mov mPre, edx
 00000032  A3 00000190 R     1      mov mCur, eax
 00000037  E2 E7        1      loop   mL
 00000039  5A           1      pop edx
 0000003A  59           1      pop ecx

               mFibonacciByMacro 13
 00000194           1   .data
              1      mPre DWORD ?
FibTest.32.asm(83) : error A2005:symbol redefinition : mPre
 mFibonacciByMacro(6): Macro Called From
  FibTest.32.asm(83): Main Line Code
              1      mCur DWORD ?
FibTest.32.asm(83) : error A2005:symbol redefinition : mCur
 mFibonacciByMacro(7): Macro Called From
  FibTest.32.asm(83): Main Line Code
 0000003B           1   .code
 0000003B  51           1      push ecx
 0000003C  52           1      push edx
 0000003D  B9 0000000D       1      mov   ecx,13
 00000042  B8 00000001       1      mov   eax,1         
 00000047  C7 05 0000018C R  1      mov   mPre,0         
      00000000
 00000051  C7 05 00000190 R  1      mov   mCur,0         
      00000000
              1   mL:
FibTest.32.asm(83) : error A2005:symbol redefinition : mL
 mFibonacciByMacro(17): Macro Called From
  FibTest.32.asm(83): Main Line Code
 0000005B  03 05 0000018C R  1      add  eax,mPre      ; eax = current + previous     
 00000061  8B 15 00000190 R  1      mov edx, mCur 
 00000067  89 15 0000018C R  1      mov mPre, edx
 0000006D  A3 00000190 R     1      mov mCur, eax
 00000072  E2 AC        1      loop   mL
 00000074  5A           1      pop edx
 00000075  59           1      pop ecx

To rescue, let’s turn on this:

LOCAL mPre, mCur, mL

Again, running mFibonacciByMacro twice with 12 and 13, fine this time, we have:

              mFibonacciByMacro 12
0000018C           1   .data
0000018C 00000000        1      ??0000 DWORD ?
00000190 00000000        1      ??0001 DWORD ?
00000000           1   .code
00000000  51           1      push ecx
00000001  52           1      push edx
00000002  B9 0000000C       1      mov   ecx,12
00000007  B8 00000001       1      mov   eax,1
0000000C  C7 05 0000018C R  1      mov   ??0000,0
     00000000
00000016  C7 05 00000190 R  1      mov   ??0001,0
     00000000
00000020           1   ??0002:
00000020  03 05 0000018C R  1      add  eax,??0000      ; eax = current + previous
00000026  8B 15 00000190 R  1      mov edx, ??0001
0000002C  89 15 0000018C R  1      mov ??0000, edx
00000032  A3 00000190 R     1      mov ??0001, eax
00000037  E2 E7        1      loop   ??0002
00000039  5A           1      pop edx
0000003A  59           1      pop ecx

              mFibonacciByMacro 13
00000194           1   .data
00000194 00000000        1      ??0003 DWORD ?
00000198 00000000        1      ??0004 DWORD ?
0000003B           1   .code
0000003B  51           1      push ecx
0000003C  52           1      push edx
0000003D  B9 0000000D       1      mov   ecx,13
00000042  B8 00000001       1      mov   eax,1
00000047  C7 05 00000194 R  1      mov   ??0003,0
     00000000
00000051  C7 05 00000198 R  1      mov   ??0004,0
     00000000
0000005B           1   ??0005:
0000005B  03 05 00000194 R  1      add  eax,??0003      ; eax = current + previous
00000061  8B 15 00000198 R  1      mov edx, ??0004
00000067  89 15 00000194 R  1      mov ??0003, edx
0000006D  A3 00000198 R     1      mov ??0004, eax
00000072  E2 E7        1      loop   ??0005
00000074  5A           1      pop edx
00000075  59           1      pop ecx

Now the label names, mPre, mCur, and mL, are not visible. Instead, running the first of mFibonacciByMacro 12, the preprocessor generates three system labels ??0000, ??0001, and ??0002 for mPre, mCur, and mL. And for the second mFibonacciByMacro 13, we can find another three system generated labels ??0003, ??0004, and ??0005 for mPre, mCur, and mL. In this way, MASM resolves the redefinition issue in multiple macro executions. You must declare your labels with the LOCAL directive in a macro.

However, by the name LOCAL, the directive sounds misleading, because the system generated ??0000, ??0001, etc. are not limited to a macro’s context. They are really global in scope. To verify, I purposely initialize mPre and mCur as 2 and 3:

LOCAL mPre, mCur, mL
.data
   mPre DWORD 2
   mCur DWORD 3

Then simply try to retrieve the values from ??0000 and ??0001 even before calling two mFibonacciByMacro in code

mov esi, ??0000
mov edi, ??0001

mFibonacciByMacro 12
mFibonacciByMacro 13

To your surprise probably, when set a breakpoint, you can enter &??0000 into the VS debug Address box as a normal variable. As we can see here, the ??0000 memory address is 0x0116518C with DWORD values 2, 3, and so on. Such a ??0000 is allocated on the data segment together with other properly named variables, as shown string ASCII beside:

o summarize, the LOCAL directive declared in a macro is to prevent data/code labels from being globally redefined.

Further, as an interesting test question, think of the following multiple running of mFibonacciByMacro which is working fine without need of a LOCAL directive in mFibonacciByMacro. Why?

mov ecx, 2
L1:
   mFibonacciByMacro 12
loop L1

Summary

I talked so much about miscellaneous features in assembly language programming. Most of them are from our class teaching and assignment discussion [1]. The basic practices are presented here with short code snippets for better understanding without irrelevant details involved. The main purpose is to show assembly language specific ideas and methods with more strength than other languages.

As noticed, I haven’t given a complete test code that requires a programming environment with input and output. For an easy try, you can go [2] to download the Irvine32 library and setup your MASM programming environment with Visual Studio, while you have to learn a lot in advance to prepare yourself first. For example, the statement exit mentioned here in main is not an element in assembly language, but is defined as INVOKE ExitProcess,0 there.

Assembly language is notable for its one-to-one correspondence between an instruction and its machine code as shown in several listings here. Via assembly code, you can get closer to the heart of the machine, such as registers and memory. Assembly language programming often plays an important role in both academic study and industry development. I hope this article could serve as an useful reference for students and professionals as well.

Assembler & Win32

В отличие от программирования под DOS, где программы написанные на языках высокого уровня (ЯВУ) были мало похожи на свои аналоги, написанные на ассемблере, приложения под Win32 имеют гораздо больше общего. В первую очередь, это связано с тем, что обращение к сервису операционной системы в Windows осуществляется посредством вызова функций, а не прерываний, что было характерно для DOS. Здесь нет передачи параметров в регистрах при обращении к сервисным функциям и, соответственно, нет и множества результирующих значений возвращаемых в регистрах общего назначения и регистре флагов. Следовательно проще запомнить и использовать протоколы вызова функций системного сервиса. С другой стороны, в Win32 нельзя непосредственно работать с аппаратным уровнем, чем \»грешили\» программы для DOS. Вообще написание программ под Win32 стало значительно проще и это обусловлено следующими факторами:

отсутствие startup кода, характерного для приложений и динамических библиотек написанных под Windows 3.x;
гибкая система адресации к памяти: возможность обращаться к памяти через любой регистр общего назначения; \»отсутствие\» сегментных регистров;
доступность больших объёмов виртуальной памяти;
развитый сервис операционной системы, обилие функций, облегчающих разработку приложений;
многообразие и доступность средств создания интерфейса с пользователем (диалоги, меню и т.п.).
Современный ассемблер, к которому относится и TASM 5.0 фирмы Borland International Inc., в свою очередь, развивал средства, которые ранее были характерны только для ЯВУ. К таким средствам можно отнести макроопределение вызова процедур, возможность введения шаблонов процедур (описание прототипов) и даже объектно-ориентированные расширения. Однако, ассемблер сохранил и такой прекрасный инструмент, как макроопределения вводимые пользователем, полноценного аналога которому нет ни в одном ЯВУ.

Все эти факторы позволяют рассматривать ассемблер, как самостоятельный инструмент для написания приложений под платформы Win32 (Windows NT и Windows 95). Как иллюстрацию данного положения, рассмотрим простой пример приложения, работающего с диалоговым окном.

Пример 1. Программа работы с диалогом Файл, содержащий текст приложения, dlg.asm
IDEAL
P586
RADIX 16
MODEL FLAT
%NOINCL
%NOLIST
include \»winconst.inc\» ; API Win32 consts
include \»winptype.inc\» ; API Win32 functions prototype
include \»winprocs.inc\» ; API Win32 function
include \»resource.inc\» ; resource consts
MAX_USER_NAME = 20
DataSeg
szAppName db \’Demo 1\’, 0
szHello db \’Hello, \’
szUser db MAX_USER_NAME dup (0)
CodeSeg
Start: call GetModuleHandleA, 0
call DialogBoxParamA, eax, IDD_DIALOG, 0, offset DlgProc, 0
cmp eax,IDOK
jne bye
call MessageBoxA, 0, offset szHello, \\
offset szAppName, \\
MB_OK or MB_ICONINFORMATION
bye: call ExitProcess, 0
public stdcall DlgProc
proc DlgProc stdcall
arg @@hDlg :dword, @@iMsg :dword, @@wPar :dword, @@lPar :dword
mov eax,[@@iMsg] cmp eax,WM_INITDIALOG
je @@init
cmp eax,WM_COMMAND
jne @@ret_false
mov eax,[@@wPar] cmp eax,IDCANCEL
je @@cancel
cmp eax,IDOK
jne @@ret_false
call GetDlgItemTextA, [@@hDlg[, IDR_NAME, \\
offset szUser, MAX_USER_NAME
mov eax,IDOK
@@cancel: call EndDialog, [@@hDlg[, eax
@@ret_false: xor eax,eax
ret
@@init: call GetDlgItem, [@@hDlg], IDR_NAME
call SetFocus, eax
jmp @@ret_false
endp DlgProc
end Start
Файл ресурсов dlg.rc

#include \»resource.h\»
IDD_DIALOG DIALOGEX 0, 0, 187, 95
STYLE DS_MODALFRAME | DS_3DLOOK | WS_POPUP | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_CLIENTEDGE
CAPTION \»Dialog\»
FONT 8, \»MS Sans Serif\»
BEGIN
DEFPUSHBUTTON \»OK\»,IDOK,134,76,50,14
PUSHBUTTON \»Cancel\»,IDCANCEL,73,76,50,14
LTEXT \»Type your name\»,IDC_STATIC,4,36,52,8
EDITTEXT IDR_NAME,72,32,112,14,ES_AUTOHSCROLL
END
Остальные файлы из данного примера, приведены в приложении 1.

Сразу после метки Start, программа обращается к функции API Win32 GetModuleHandle для получения handle данного модуля (данный параметр чаще именуют как handle of instance). Получив handle, мы вызываем диалог, созданный либо вручную, либо с помощью какой-либо программы построителя ресурсов. Далее программа проверяет результат работы диалогового окна. Если пользователь вышел из диалога посредством нажатия клавиши OK, то приложение запускает MessageBox с текстом приветствия.

Диалоговая процедура обрабатывает следующие сообщения. При инициализации диалога (WM_INITDIALOG) она просит Windows установить фокус на поле ввода имени пользователя. Сообщение WM_COMMAND обрабатывается в таком порядке: делается проверка на код нажатия клавиши. Если была нажата клавиша OK, то пользовательский ввод копируется в переменную szValue, если же была нажата клавиша Cancel, то копирования не производится. Но и в том и другом случае вызывается функция

окончания диалога: EndDialog. Остальные сообщения в группе WM_COMMAND просто игнорируются, предоставляя Windows действовать по умолчанию.

Вы можете сравнить приведённую программу с аналогичной программой, написанной на ЯВУ, разница в написании будет незначительна. Очевидно те, кто писал приложения на ассемблере под Windows 3.x, отметят тот факт, что исчезла необходимость в сложном и громоздком startup коде. Теперь приложение выглядит более просто и естественно.

Пример 2. Динамическая библиотека
Написание динамических библиотек под Win32 также значительно упростилось, по сравнению с тем, как это делалось под Windows 3.x. Исчезла необходимость вставлять startup код, а использование четырёх событий инициализации/деинициализации на уровне процессов и потоков, кажется логичным.

Рассмотрим простой пример динамической библиотеки, в которой всего одна функция, преобразования целого числа в строку в шестнадцатеричной системе счисления. Файл mylib.asm

Ideal
P586
Radix 16
Model flat
DLL_PROCESS_ATTACH = 1

extrn GetVersion: proc

DataSeg
hInst dd 0
OSVer dw 0

CodeSeg
proc libEntry stdcall
arg @@hInst :dword, @@rsn :dword, @@rsrv :dword
cmp [@@rsn],DLL_PROCESS_ATTACH
jne @@1
call GetVersion
mov [OSVer],ax
mov eax,[@@hInst] mov [hInst],eax
@@1: mov eax,1
ret
endP libEntry

public stdcall Hex2Str
proc Hex2Str stdcall
arg @@num :dword, @@str :dword
uses ebx
mov eax,[@@num] mov ebx,[@@str] mov ecx,7
@@1: mov edx,eax
shr eax,4
and edx,0F
cmp edx,0A
jae @@2
add edx,\’0\’
jmp @@3
@@2: add edx,\’A\’ — 0A
@@3: mov [byte ebx + ecx],dl
dec ecx
jns @@1
mov [byte ebx + 8],0
ret
endp Hex2Str

end libEntry
Остальные файлы, которые необходимы для данного примера, можно найти в приложении 2.

Краткие комментарии к динамической библиотеке

Процедура libEntry является точкой входа в динамическую библиотеку, её не надо объявлять как экспортируемую, загрузчик сам определяет её местонахождение. LibEntry может вызываться в четырёх случаях:

при проецировании библиотеки в адресное пространство процесса (DLL_PROCESS_ATTACH);
при первом вызове библиотеки из потока (DLL_THREAD_ATTACH), например, с помощью функции LoadLibrary;
при выгрузке библиотеки потоком (DLL_THREAD_DETACH);
при выгрузке библиотеки из адресного пространства процесса (DLL_PROCESS_DETACH).
В нашем примере обрабатывается только первое из событий DLL_PROCESS_ATTACH. При обработке данного события библиотека запрашивает версию OS сохраняет её, а также свой handle of instance.

Библиотека содержит только одну экспортируемую функцию, которая собственно не требует пояснений. Вы, пожалуй, можете обратить внимание на то, как производится запись преобразованных значений. Интересна система адресации посредством двух регистров общего назначения: ebx + ecx, она позволяет нам использовать регистр ecx одновременно и как счётчик и как составную часть адреса.

Пример 3. Оконное приложение
Файл dmenu.asm

Ideal
P586
Radix 16
Model flat

struc WndClassEx
cbSize dd 0
style dd 0
lpfnWndProc dd 0
cbClsExtra dd 0
cbWndExtra dd 0
hInstance dd 0
hIcon dd 0
hCursor dd 0
hbrBackground dd 0
lpszMenuName dd 0
lpszClassName dd 0
hIconSm dd 0
ends WndClassEx

struc Point
x dd 0
y dd 0
ends Point

struc msgStruc
hwnd dd 0
message dd 0
wParam dd 0
lParam dd 0
time dd 0
pnt Point <>
ends msgStruc

MyMenu = 0065
ID_OPEN = 9C41
ID_SAVE = 9C42
ID_EXIT = 9C43

CS_HREDRAW = 0001
CS_VREDRAW = 0002
IDI_APPLICATION = 7F00
IDC_ARROW = 00007F00
COLOR_WINDOW = 5
WS_EX_WINDOWEDGE = 00000100
WS_EX_CLIENTEDGE = 00000200
WS_EX_OVERLAPPEDWINDOW = WS_EX_WINDOWEDGE OR WS_EX_CLIENTEDGE
WS_OVERLAPPED = 00000000
WS_CAPTION = 00C00000
WS_SYSMENU = 00080000
WS_THICKFRAME = 00040000
WS_MINIMIZEBOX = 00020000
WS_MAXIMIZEBOX = 00010000
WS_OVERLAPPEDWINDOW = WS_OVERLAPPED OR WS_CAPTION OR \\
WS_SYSMENU OR WS_THICKFRAME OR \\
WS_MINIMIZEBOX OR WS_MAXIMIZEBOX
CW_USEDEFAULT = 80000000
SW_SHOW = 5
WM_COMMAND = 0111
WM_DESTROY = 0002
WM_CLOSE = 0010
MB_OK = 0

PROCTYPE ptGetModuleHandle stdcall \\
lpModuleName :dword

PROCTYPE ptLoadIcon stdcall \\
hInstance :dword, \\
lpIconName :dword

PROCTYPE ptLoadCursor stdcall \\
hInstance :dword, \\
lpCursorName :dword

PROCTYPE ptLoadMenu stdcall \\
hInstance :dword, \\
lpMenuName :dword

PROCTYPE ptRegisterClassEx stdcall \\
lpwcx :dword

PROCTYPE ptCreateWindowEx stdcall \\
dwExStyle :dword, \\
lpClassName :dword, \\
lpWindowName :dword, \\
dwStyle :dword, \\
x :dword, \\
y :dword, \\
nWidth :dword, \\
nHeight :dword, \\
hWndParent :dword, \\
hMenu :dword, \\
hInstance :dword, \\
lpParam :dword

PROCTYPE ptShowWindow stdcall \\
hWnd :dword, \\
nCmdShow :dword

PROCTYPE ptUpdateWindow stdcall \\
hWnd :dword

PROCTYPE ptGetMessage stdcall \\
pMsg :dword, \\
hWnd :dword, \\
wMsgFilterMin :dword, \\
wMsgFilterMax :dword

PROCTYPE ptTranslateMessage stdcall \\
lpMsg :dword

PROCTYPE ptDispatchMessage stdcall \\
pmsg :dword

PROCTYPE ptSetMenu stdcall \\
hWnd :dword, \\
hMenu :dword

PROCTYPE ptPostQuitMessage stdcall \\
nExitCode :dword

PROCTYPE ptDefWindowProc stdcall \\
hWnd :dword, \\
Msg :dword, \\
wParam :dword, \\
lParam :dword

PROCTYPE ptSendMessage stdcall \\
hWnd :dword, \\
Msg :dword, \\
wParam :dword, \\
lParam :dword

PROCTYPE ptMessageBox stdcall \\
hWnd :dword, \\
lpText :dword, \\
lpCaption :dword, \\
uType :dword

PROCTYPE ptExitProcess stdcall \\
exitCode :dword

extrn GetModuleHandleA :ptGetModuleHandle
extrn LoadIconA :ptLoadIcon
extrn LoadCursorA :ptLoadCursor
extrn RegisterClassExA :ptRegisterClassEx
extrn LoadMenuA :ptLoadMenu
extrn CreateWindowExA :ptCreateWindowEx
extrn ShowWindow :ptShowWindow
extrn UpdateWindow :ptUpdateWindow
extrn GetMessageA :ptGetMessage
extrn TranslateMessage :ptTranslateMessage
extrn DispatchMessageA :ptDispatchMessage
extrn SetMenu :ptSetMenu
extrn PostQuitMessage :ptPostQuitMessage
extrn DefWindowProcA :ptDefWindowProc
extrn SendMessageA :ptSendMessage
extrn MessageBoxA :ptMessageBox
extrn ExitProcess :ptExitProcess

UDataSeg
hInst dd ?
hWnd dd ?

IFNDEF VER1
hMenu dd ?
ENDIF

DataSeg
msg msgStruc <>
classTitle db \’Menu demo\’, 0
wndTitle db \’Demo program\’, 0
msg_open_txt db \’You selected open\’, 0
msg_open_tlt db \’Open box\’, 0
msg_save_txt db \’You selected save\’, 0
msg_save_tlt db \’Save box\’, 0

CodeSeg
Start: call GetModuleHandleA, 0 ; получаем hInstance
mov [hInst],eax

sub esp,SIZE WndClassEx ; выделяем место в стеке
; заполняем структуру WndClassEx
mov [(WndClassEx esp).cbSize],SIZE WndClassEx
mov [(WndClassEx esp).style],CS_HREDRAW or CS_VREDRAW
mov [(WndClassEx esp).lpfnWndProc],offset WndProc
mov [(WndClassEx esp).cbWndExtra],0
mov [(WndClassEx esp).cbClsExtra],0
mov [(WndClassEx esp).hInstance],eax
call LoadIconA, 0, IDI_APPLICATION
mov [(WndClassEx esp).hIcon],eax
call LoadCursorA, 0, IDC_ARROW
mov [(WndClassEx esp).hCursor],eax
mov [(WndClassEx esp).hbrBackground],COLOR_WINDOW
IFDEF VER1
mov [(WndClassEx esp).lpszMenuName],MyMenu
ELSE
mov [(WndClassEx esp).lpszMenuName],0
ENDIF
mov [(WndClassEx esp).lpszClassName],offset classTitle
mov [(WndClassEx esp).hIconSm],0
call RegisterClassExA, esp ; регистрируем окно

add esp,SIZE WndClassEx ; восстановим стек
; создадим окно

IFNDEF VER2
call CreateWindowExA, WS_EX_OVERLAPPEDWINDOW, \\ extended window style
offset classTitle, \\ pointer to registered class name
offset wndTitle, \\ pointer to window name
WS_OVERLAPPEDWINDOW, \\ window style
CW_USEDEFAULT, \\ horizontal position of window
CW_USEDEFAULT, \\ vertical position of window
CW_USEDEFAULT, \\ window width
CW_USEDEFAULT, \\ window height
0, \\ handle to parent or owner window
0, \\ handle to menu, or child-window
\\ identifier
[hInst], \\ handle to application instance
0 ; pointer to window-creation data
ELSE
call LoadMenu, hInst, MyMenu
mov [hMenu],eax
call CreateWindowExA, WS_EX_OVERLAPPEDWINDOW, \\ extended window style
offset classTitle, \\ pointer to registered class name
offset wndTitle, \\ pointer to window name
WS_OVERLAPPEDWINDOW, \\ window style
CW_USEDEFAULT, \\ horizontal position of window
CW_USEDEFAULT, \\ vertical position of window
CW_USEDEFAULT, \\ window width
CW_USEDEFAULT, \\ window height
0, \\ handle to parent or owner window
eax, \\ handle to menu, or child-window
\\ identifier
[hInst], \\ handle to application instance
0 ; pointer to window-creation data
ENDIF
mov [hWnd],eax
call ShowWindow, eax, SW_SHOW ; show window
call UpdateWindow, [hWnd] ; redraw window

IFDEF VER3
call LoadMenuA, [hInst], MyMenu
mov [hMenu],eax
call SetMenu, [hWnd], eax
ENDIF

msg_loop:
call GetMessageA, offset msg, 0, 0, 0
or ax,ax
jz exit
call TranslateMessage, offset msg
call DispatchMessageA, offset msg
jmp msg_loop
exit: call ExitProcess, 0

public stdcall WndProc
proc WndProc stdcall
arg @@hwnd: dword, @@msg: dword, @@wPar: dword, @@lPar: dword
mov eax,[@@msg] cmp eax,WM_COMMAND
je @@command
cmp eax,WM_DESTROY
jne @@default
call PostQuitMessage, 0
xor eax,eax
jmp @@ret
@@default:
call DefWindowProcA, [@@hwnd], [@@msg], [@@wPar], [@@lPar] @@ret: ret
@@command:
mov eax,[@@wPar] cmp eax,ID_OPEN
je @@open
cmp eax,ID_SAVE
je @@save
call SendMessageA, [@@hwnd], WM_CLOSE, 0, 0
xor eax,eax
jmp @@ret
@@open: mov eax, offset msg_open_txt
mov edx, offset msg_open_tlt
jmp @@mess
@@save: mov eax, offset msg_save_txt
mov edx, offset msg_save_tlt
@@mess: call MessageBoxA, 0, eax, edx, MB_OK
xor eax,eax
jmp @@ret
endp WndProc
end Start
Комментарии к программе
Здесь мне хотелось в первую очередь продемонстрировать использование прототипов функций API Win32. Конечно их (а также описание констант и структур из API Win32) следует вынести в отдельные подключаемые файлы, поскольку, скорее всего Вы будете использовать их и в других программах. Описание прототипов функций обеспечивает строгий контроль со стороны компилятора за количеством и типом параметров, передаваемых в функции. Это существенно облегчает жизнь программисту, позволяя избежать ошибок времени исполнения, тем более, что число параметров в некоторых функциях API Win32 весьма значительно.

Существо данной программы заключается в демонстрации вариантов работы с оконным меню. Программу можно откомпилировать в трёх вариантах (версиях), указывая компилятору ключи VER2 или VER3 (по умолчанию используется ключ VER1). В первом варианте программы меню определяется на уровне класса окна и все окна данного класса будут иметь аналогичное меню. Во втором варианте, меню определяется при создании окна, как параметр функции CreateWindowEx. Класс окна не имеет меню и в данном случае, каждое окно этого класса может иметь своё собственное меню. Наконец, в третьем варианте, меню загружается после создания окна. Данный вариант показывает, как можно связать меню с уже созданным окном.

Директивы условной компиляции позволяют включить все варианты в текст одной и той же программы. Подобная техника удобна не только для демонстрации, но и для отладки. Например, когда Вам требуется включить в программу новый фрагмент кода, то Вы можете применить данную технику, дабы не потерять функционирующий модуль. Ну, и конечно, применение директив условной компиляции — наиболее удобное средство тестирования различных решений (алгоритмов) на одном модуле.

Представляет определённый интерес использование стековых фреймов и заполнение структур в стеке посредством регистра указателя стека (esp). Именно это продемонстрировано при заполнении структуры WndClassEx. Выделение места в стеке (фрейма) делается простым перемещением esp: sub esp,SIZE WndClassEx

Теперь мы можем обращаться к выделенной памяти используя всё тот же регистр указатель стека. При создании 16-битных приложений такой возможностью мы не обладали. Данный приём можно использовать внутри любой процедуры или даже произвольном месте программы. Накладные расходы на подобное выделение памяти минимальны, однако, следует учитывать, что размер стека ограничен и размещать большие объёмы данных в стеке вряд ли целесообразно. Для этих целей лучше использовать \»кучи\» (heap) или виртуальную память (virtual memory).

Остальная часть программы достаточно тривиальна и не требует каких-либо пояснений. Возможно более интересным покажется тема использования макроопределений.

Макроопределения
Мне достаточно редко приходилось серьёзно заниматься разработкой макроопределений при программировании под DOS. В Win32 ситуация принципиально иная. Здесь грамотно написанные макроопределения способны не только облегчить чтение и восприятие программ, но и реально облегчить жизнь программистов. Дело в том, что в Win32 фрагменты кода часто повторяются, имея при этом не принципиальные отличия. Наиболее показательна, в этом смысле, оконная и/или диалоговая процедура. И в том и другом случае мы определяем вид сообщения и передаём управление тому участку кода, который отвечает за обработку полученного сообщения. Если в программе активно используются диалоговые окна, то аналогичные фрагменты кода сильно перегрузят программу, сделав её малопригодной для восприятия. Применение макроопределений в таких ситуациях более чем оправдано. В качестве основы для макроопределения, занимающегося диспетчеризацией поступаю щих сообщений на обработчиков, может послужить следующее описание.

Пример макроопределений

macro MessageVector message1, message2:REST
IFNB
dd message1
dd offset @@&message1

@@VecCount = @@VecCount + 1
MessageVector message2
ENDIF
endm MessageVector

macro WndMessages VecName, message1, message2:REST
@@VecCount = 0
DataSeg
label @@&VecName dword
MessageVector message1, message2
@@&VecName&Cnt = @@VecCount
CodeSeg
mov ecx,@@&VecName&Cnt

mov eax,[@@msg] @@&VecName&_1: dec ecx
js @@default
cmp eax,[dword ecx * 8 + offset @@&VecName] jne @@&VecName&_1
jmp [dword ecx + offset @@&VecName + 4]

@@default: call DefWindowProcA, [@@hWnd], [@@msg], [@@wPar], [@@lPar] @@ret: ret
@@ret_false: xor eax,eax
jmp @@ret
@@ret_true: mov eax,-1
dec eax
jmp @@ret
endm WndMessage
Комментарии к макроопределениям
При написании процедуры окна Вы можете использовать макроопределение WndMessages, указав в списке параметров те сообщения, обработку которых намерены осуществить. Тогда процедура окна примет вид:

proc WndProc stdcall
arg @@hWnd: dword, @@msg: dword, @@wPar: dword, @@lPar: dword
WndMessages WndVector, WM_CREATE, WM_SIZE, WM_PAINT, WM_CLOSE, WM_DESTROY

@@WM_CREATE:
; здесь обрабатываем сообщение WM_CREATE
@@WM_SIZE:
; здесь обрабатываем сообщение WM_SIZE
@@WM_PAINT:
; здесь обрабатываем сообщение WM_PAINT
@@WM_CLOSE:
; здесь обрабатываем сообщение WM_CLOSE
@@WM_DESTROY:
; здесь обрабатываем сообщение WM_DESTROY

endp WndProc
Обработку каждого сообщения можно завершить тремя способами:

вернуть значение TRUE, для этого необходимо использовать переход на метку @@ret_true;
вернуть значение FALSE, для этого необходимо использовать переход на метку @@ret_false;
перейти на обработку по умолчанию, для этого необходимо сделать переход на метку @@default.
Отметьте, что все перечисленные метки определены в макро WndMessages и Вам не следует определять их заново в теле процедуры.

Теперь давайте разберёмся, что происходит при вызове макроопределения WndMessages. Вначале производится обнуление счётчика параметров самого макроопределения (число этих параметров может быть произвольным). Теперь в сегменте данных создадим метку с тем именем, которое передано в макроопределение в качестве первого параметра. Имя метки формируется путём конкатенации символов @@ и названия вектора. Достигается это за счёт использования оператора &. Например, если передать имя TestLabel, то название метки примет вид: @@TestLabel. Сразу за объявлением метки вызывается другое макроопределение MessageVector, в которое передаются все остальные параметры, которые должны быть ничем иным, как списком сообщений, подлежащих обработке в процедуре окна. Структура макроопределения MessageVector проста и бесхитростна. Она извлекает первый параметр и в ячейку памяти формата dword заносит код сообщения. В следующую ячейку памяти формата dword записывается а дрес метки обработчика, имя которой формируется по описанному выше правилу. Счётчик сообщений увеличивается на единицу. Далее следует рекурсивный вызов с передачей ещё не зарегистрированных сообщений, и так продолжается до тех пор, пока список сообщений не будет исчерпан.

Сейчас в макроопределении WndMessage можно начинать обработку. Теперь существо обработки скорее всего будет понятно без дополнительных пояснений.

Обработка сообщений в Windows не является линейной, а, как правило, представляет собой иерархию. Например, сообщение WM_COMMAND может заключать в себе множество сообщений поступающих от меню и/или других управляющих элементов. Следовательно, данную методику можно с успехом применить и для других уровней каскада и даже несколько упростить её. Действительно, не в наших силах исправить код сообщений, поступающих в процедуру окна или диалога, но выбор последовательности констант, назначаемых пунктам меню или управляющим элементам (controls) остаётся за нами. В этом случае нет нужды в дополнительном поле, которое сохраняет код сообщения. Тогда каждый элемент вектора будет содержать только адрес обработчика, а найти нужный элемент весьма просто. Из полученной константы, пришедшей в сообщении, вычитается идентификатор первого пункта меню или первого управляющего элемента, это и будет номер нужного элемента вектора. Остаётся только сделать переход на обработчик.

Вообще тема макроопределений весьма поучительна и обширна. Мне редко доводится видеть грамотное использование макросов и это досадно, поскольку с их помощью можно сделать работу в ассемблере значительно проще и приятнее.

Резюме
Для того, чтобы писать полноценные приложения под Win32 требуется не так много:

собственно компилятор и компоновщик (я использую связку TASM32 и TLINK32 из пакета TASM 5.0). Перед использованием рекомендую \»наложить\» patch, на данный пакет. Patch можно взять на site http://www.borland.com/ или на нашем ftp сервере ftp.uralmet.ru.
редактор и компилятор ресурсов (я использую Developer Studio и brcc32.exe);
выполнить перетрансляцию header файлов с описаниями процедур, структур и констант API Win32 из нотации принятой в языке Си, в нотацию выбранного режима ассемблера: Ideal или MASM.
В результате у Вас появится возможность писать лёгкие и изящные приложения под Win32, с помощью которых Вы сможете создавать и визуальные формы, и работать с базами данных, и обслуживать коммуникации, и работать multimedia инструментами. Как и при написании программ под DOS, у Вас сохраняется возможность наиболее полного использования ресурсов процессора, но при этом сложность написания приложений значительно снижается за счёт более мощного сервиса операционной системы, использования более удобной системы адресации и весьма простого оформления программ.

Приложение 1. Файлы, необходимые для первого примера
Файл констант ресурсов resource.inc

IDD_DIALOG = 65 ; 101
IDR_NAME = 3E8 ; 1000
IDC_STATIC = -1
Файл определений dlg.def

NAME TEST
DESCRIPTION \’Demo dialog\’
EXETYPE WINDOWS
EXPORTS DlgProc @1
Файл компиляции makefile

# Make file for Demo dialog
# make -B
# make -B -DDEBUG for debug information

NAME = dlg
OBJS = $(NAME).obj
DEF = $(NAME).def
RES = $(NAME).res

TASMOPT=/m3 /mx /z /q /DWINVER=0400 /D_WIN32_WINNT=0400

!if $d(DEBUG)
TASMDEBUG=/zi
LINKDEBUG=/v
!else
TASMDEBUG=/l
LINKDEBUG=
!endif

!if $d(MAKEDIR)
IMPORT=$(MAKEDIR)\\..\\lib\\import32
!else
IMPORT=import32
!endif

$(NAME).EXE: $(OBJS) $(DEF) $(RES)
tlink32 /Tpe /aa /c $(LINKDEBUG) $(OBJS),$(NAME),, $(IMPORT), $(DEF), $(RES)

.asm.obj:
tasm32 $(TASMDEBUG) $(TASMOPT) $&.asm

$(RES): $(NAME).RC
BRCC32 -32 $(NAME).RC
Файл заголовков resource.h

//{{NO_DEPENDENCIES}}
// Microsoft Developer Studio generated include file.
// Used by dlg.rc
//
#define IDD_DIALOG 101
#define IDR_NAME 1000
#define IDC_STATIC -1

// Next default values for new objects
//
#ifdef APSTUDIO_INVOKED
#ifndef APSTUDIO_READONLY_SYMBOLS
#define _APS_NEXT_RESOURCE_VALUE 102
#define _APS_NEXT_COMMAND_VALUE 40001
#define _APS_NEXT_CONTROL_VALUE 1001
#define _APS_NEXT_SYMED_VALUE 101
#endif
#endif
Приложение 2. Файлы, необходимые для второго примера
Файл описания mylib.def

LIBRARY MYLIB
DESCRIPTION \’DLL EXAMPLE, 1997\’
EXPORTS Hex2Str @1
Файл компиляции makefile

# Make file for Demo DLL
# make -B
# make -B -DDEBUG for debug information

NAME = mylib
OBJS = $(NAME).obj
DEF = $(NAME).def
RES = $(NAME).res

TASMOPT=/m3 /mx /z /q /DWINVER=0400 /D_WIN32_WINNT=0400

!if $d(DEBUG)
TASMDEBUG=/zi
LINKDEBUG=/v
!else
TASMDEBUG=/l
LINKDEBUG=
!endif

!if $d(MAKEDIR)
IMPORT=$(MAKEDIR)\\..\\lib\\import32
!else
IMPORT=import32
!endif

$(NAME).EXE: $(OBJS) $(DEF)
tlink32 /Tpd /aa /c $(LINKDEBUG) $(OBJS),$(NAME),, $(IMPORT), $(DEF)

.asm.obj:
tasm32 $(TASMDEBUG) $(TASMOPT) $&.asm

$(RES): $(NAME).RC
BRCC32 -32 $(NAME).RC
Приложение 3. Файлы, необходимые для третьего примера
Файл описания dmenu.def

NAME TEST
DESCRIPTION \’Demo menu\’
EXETYPE WINDOWS
EXPORTS WndProc @1
Файл ресурсов dmenu.rc

#include \»resource.h\»
MyMenu MENU DISCARDABLE
BEGIN POPUP \»Files\»
BEGIN
MENUITEM \»Open\», ID_OPEN
MENUITEM \»Save\», ID_SAVE
MENUITEM SEPARATOR
MENUITEM \»Exit\», ID_EXIT
END
MENUITEM \»Other\», 65535
END
Файл заголовков resource.h

//{{NO_DEPENDENCIES}}
// Microsoft Developer Studio generated include file.
// Used by dmenu.rc
//
#define MyMenu 101
#define ID_OPEN 40001
#define ID_SAVE 40002
#define ID_EXIT 40003
// Next default values for new objects
//
#ifdef APSTUDIO_INVOKED
#ifndef APSTUDIO_READONLY_SYMBOLS
#define _APS_NEXT_RESOURCE_VALUE 102
#define _APS_NEXT_COMMAND_VALUE 40004
#define _APS_NEXT_CONTROL_VALUE 1000
#define _APS_NEXT_SYMED_VALUE 101
#endif
#endif
Файл компиляции makefile

# Make file for Turbo Assembler Demo menu
# make -B
# make -B -DDEBUG -DVERN for debug information and version
NAME = dmenu
OBJS = $(NAME).obj
DEF = $(NAME).def
RES = $(NAME).res
!if $d(DEBUG)TASMDEBUG=/zi
LINKDEBUG=/v
!else
TASMDEBUG=/l
LINKDEBUG=
!endif

!if $d(VER2)
TASMVER=/dVER2
!elseif $d(VER3)
TASMVER=/dVER3
!else
TASMVER=/dVER1
!endif

!if $d(MAKEDIR)
IMPORT=$(MAKEDIR)\\..\\lib\\import32
!else
IMPORT=import32
!endif

$(NAME).EXE: $(OBJS) $(DEF) $(RES)
tlink32 /Tpe /aa /c $(LINKDEBUG) $(OBJS),$(NAME),, $(IMPORT), $(DEF), $(RES)

.asm.obj:
tasm32 $(TASMDEBUG) $(TASMVER) /m /mx /z /zd $&.asm

$(RES): $(NAME).RC
BRCC32 -32 $(NAME).RC