An EXE file is not as simple as a COM file. The EXE file is designed to allow DOS to execute programs that require more than 64 kilobytes of code, data and stack. When loading an EXE file, DOS makes no a priori assumptions about the size of the file, or what is code or data. All of this information is stored in the EXE file itself, in the EXE Header at the beginning of the file. This
header has two parts to it, a fixed-length portion, and a variable length table of pointers to segment references in the Load Module, called the Relocation Pointer Table. Since any virus which attacks EXE files must be able to manipulate the data in the EXE Header, we’d better take some time to look at it. Figure 10 is a graphical representation of an EXE file. The meaning of each byte in the header is explained in Table 1.
When DOS loads the EXE, it uses the Relocation Pointer Table to modify all segment references in the Load Module. After that, the segment references in the image of the program loaded into memory point to the correct memory location. Let’s consider an example (Figure 11): Imagine an EXE file with two segments.
The segment at the start of the load module contains a far call to the second segment. In the load module, this call looks like this:
Address Assembly Language Machine Code 0000:0150 CALL FAR 0620:0980 9A 80 09 20 06
From this, one can infer that the start of the second segment is 6200H (= 620H x 10H) bytes from the start of the load module. The
Relocation Pointer Table EXE File Header
EXE Load Module
Figure 10: The layout of an EXE file.
Relocatable Ptr Table
EXE Header
0000:0150 0620:0980
0000:0153 CALL FAR 0620:0980 Routine X
Load Module ON DISK
PSP
CALL FAR 2750:0980 Routine X
IN RAM
Executable Machine
Code
2750:0980
2130:0150
2130:0000
DOS
Figure 11: An example of relocating code.
Table 1: Structure of the EXE Header.
Offset Size Name Description
0 2 Signature These bytes are the characters M and Z in every EXE file and iden- tify the file as an EXE file. If they are anything else, DOS will try to treat the file as a COM file.
2 2 Last Page Size Actual number of bytes in the final 512 byte page of the file (see Page Count).
4 2 Page Count The number of 512 byte pages in the file. The last page may only be partially filled, with the number of valid bytes specified in Last Page Size. For example a file of 2050 bytes would have Page Size = 4 and Last Page Size = 2.
6 2 Reloc Table Entries The number of entries in the re- location pointer table
8 2 Header Paragraphs The size of the EXE file header in 16 byte paragraphs, including the Relocation table. The header is always a multiple of 16 bytes in length.
0AH 2 MINALLOC The minimum number of 16 byte paragraphs of memory that the pro- gram requires to execute. This is in addition to the image of the program stored in the file. If enough memory is not available, DOS will return an error when it tries to load the program.
0CH 2 MAXALLOC The maximum number of 16 byte paragraphs to allocate to the pro- gram when it is executed. This is normally set to FFFF Hex, except for TSR’s.
0EH 2 Initial ss This contains the initial value of the stack segment relative to the start of the code in the EXE file, when the file is loaded.
This is modified dynamically by DOS when the file is loaded, to reflect the proper value to store in the ss register.
10H 2 Initial sp The initial value to set sp to when the program is executed.
12H 2 Checksum A word oriented checksum value such that the sum of all words in the file is FFFF Hex. If the file is an odd number of bytes long, the lost byte is treated as a word with the high byte = 0.
Often this checksum is used for nothing, and some compilers do not even bother to set it proper-
Offset Size Name Description
12H (Cont) properly. The INTRUDER virus will not alter the checksum.
14H 2 Initial ip The initial value for the instruction pointer, ip, when the program is loaded.
16H 2 Initial cs Initial value of the code seg- ment relative to the start of the code in the EXE file. This is modified by DOS at load time.
18H 2 Relocation Tbl Offset Offset of the start of the relocation table from the start of the file, in bytes.
1AH 2 Overlay Number The resident, primary part of a program always has this word set to zero. Overlays will have dif- ferent values stored here.
Table 1: Structure of the EXE Header (continued).
Relocation Pointer Table would contain a vector 0000:0153 to point to the segment reference (20 06) of this far call. When DOS loads the program, it might load it starting at segment 2130H, because DOS and some memory resident programs occupy locations below this. So DOS would first load the Load Module into memory at 2130:0000. Then it would take the relocation pointer 0000:0153 and transform it into a pointer, 2130:0153 which points to the segment in the far call in memory. DOS will then add 2130H to the word in that location, resulting in the machine language code 9A 80 09 50 27, or CALL FAR 2750:0980 (See Figure 11).
Note that a COM program requires none of these calisthen- ics since it contains no segment references. Thus, DOS just has to set the segment registers all to one value before passing control to the program.