As in the TIMID virus, the search mechanism can be broken down into two parts: FIND_FILE simply locates possible files to infect. FILE_OK, determines whether a file can be infected.
The FILE_OK procedure will be almost the same as the one in TIMID. It must open the file in question and determine whether it can be infected and make sure it has not already been infected. The only two criteria for determining whether an EXE file can be infected are whether the Overlay Number is zero, and whether it has enough room in its relocation pointer table for two more pointers. The latter requirement is determined by a simple calculation from values stored in the EXE header. If
16*Header Paragraphs-4*Relocation Table Entries-Relocation Table Offset
is greater than or equal to 8 (=4 times the number of relocatables the virus requires), then there is enough room in the relocation pointer table. This calculation is performed by the subroutine REL_ROOM, which is called by FILE_OK.
To determine whether the virus has already infected a file, we put an ID word with a pre-assigned value in the code segment
at a fixed offset (say 0). Then, when checking the file, FILE_OK gets the segment from the Initial cs in the EXE header. It uses that with the offset 0 to find the ID word in the load module (provided the virus is there). If the virus has not already infected the file, Initial cs will contain the initial code segment of the host program.
Then our calculation will fetch some random word out of the file which probably won’t match the ID word’s required value. In this way FILE_OK will know that the file has not been infected. So FILE_OK stays fairly simple.
However, we want to design a much more sophisticated FIND_FILE procedure than TIMID’s. The procedure in TIMID could only search for files in the current directory to attack. That was fine for starters, but a good virus should be able to leap from directory to directory, and even from drive to drive. Only in this way does a virus stand a reasonable chance of infecting a significant portion of the files on a system, and jumping from system to system.
To search more than one directory, we need a tree search routine. That is a fairly common algorithm in programming. We write a routine FIND_BR, which, given a directory, will search it for an EXE which will pass FILE_OK. If it doesn’t find a file, it will proceed to search for subdirectories of the currently referenced directory. For each subdirectory found, FIND_BR will recursively call itself using the new subdirectory as the directory to perform a search on. In this manner, all of the subdirectories of any given directory may be searched for a file to infect. If one specifies the directory to search as the root directory, then all files on a disk will get searched.
Making the search too long and involved can be a problem though. A large hard disk can easily contain a hundred subdirecto- ries and thousands of files. When the virus is new to the system it will quickly find an uninfected file that it can attack, so the search will be unnoticably fast. However, once most of the files on the system are already infected, the virus might make the disk whirr for twenty seconds while examining all of the EXE’s on a given drive to find one to infect. That could be a rather obvious clue that something is wrong.
To minimize the search time, we must truncate the search in such a way that the virus will still stand a reasonable chance of
infecting every EXE file on the system. To do that we make use of the typical PC user’s habits. Normally, EXE’s are spread pretty evenly throughout different directories. Users often put frequently used programs in their path, and execute them from different directories. Thus, if our virus searches the current directory, and all of its subdirectories, up to two levels deep, it will stand a good chance of infecting a whole disk. As added insurance, it can also search the root directory and all of its subdirectories up to one level deep. Obviously, the virus will be able to migrate to different drives and directories without searching them specifically, because it will attack files on the current drive when an infected program is executed, and the program to be executed need not be on the current drive.
When coding the FIND_FILE routine, it is convenient to structure it in three levels. First is a master routine FIND_FILE, which decides which subdirectory branches to search. The second level is a routine which will search a specified directory branch to
FIND_FILE
FINDBR
FINDEXE
FILE_OK
FIRSTDIR NEXTDIR
SUBDIR1
(CURRENT)
SUBDIR2
SD11 SD12 SD21
SD111 SD112 SD121 SD211
SD1112 SD1113 SD2111 SD2112
ROOT DIR Figure 12: Logic of the file search routines.
a specified level, FIND_BR. When FIND_BR is called, a directory path is stored as a null terminated ASCII string in the variable USEFILE, and the depth of the search is specified in LEVEL. At the third level of the search algorithm, one routine searchs for EXE files (FINDEXE) and two search for subdirectories (FIRSTDIR and NEXTDIR). The routine that searches for EXE files will call FILE_OK to determine whether each file it finds is infectable, and it will stop everything when it finds a good file. The logic of this searching sequence is illustrated in Figure 12. The code for these routines is also listed in Appendix B.