Structure of Assembly Language Modules

Chapter 4 Assembler Rules and Directives

4.2 Structure of Assembly Language Modules

We begin by examining a very simple module as a starting point. Consider the following code:

AREA ARMex, CODE, READONLY

; Name this block of code ARMex

ENTRY ; Mark first instruction to execute start MOV r0, #10 ; Set up parameters

MOV r1, #3

ADD r0, r0, r1 ; r0 = r0 + r1 stop B stop ; infinite loop

END ; Mark end of file

While the routine may appear a little cryptic, it only does one thing: it adds the numbers 10 and 3 together. The rest of the code consists of directives for the assembler and an instruction at the end to put the processor in an infinite loop. You can see that there is some structure to the lines of code, and the general form of source lines in your assembly files is

{label} {instruction|directive|pseudo-instruction} {;comment}

where each field in braces is optional. Labels are names that you choose to repre- sent an address somewhere in memory, and while they eventually do need to be translated into a numeric value, as a programmer you simply work with the name throughout your code. The linker will calculate the correct address during the link- age process that follows assembly. Note that a label name can only be defined once in your code, and labels must start at the beginning of the line (there are some assemblers that will allow you to place the label at any point, but they require delimiters such as a colon).

The instructions, directives, and pseudo-instructions (such as ADR that we will see in Chapter 6) must be preceded by a white space, either a tab or any number of spaces, even if you don’t have a label at the beginning. One of the most common mis- takes new programmers make is starting an instruction in column one. To make your code more readable, you may use blank lines, since all three sections of the source line are optional. ARM and Thumb instructions available on the ARM7TDMI are from the ARM version 4T instruction set; the Thumb-2 instructions used on the Cortex-M4 are from the v7-M instruction set. All of these can be found in the respec- tive Architectural Reference Manuals, along with their mnemonics and uses. Just to start us off, the ARM instructions for the ARM7TDMI are also listed in Table 4.1, and we’ll slowly introduce the v7-M instructions throughout the text. There are many directives and pseudo-instructions, but we will cover only a handful throughout this chapter to get a sense of what is possible.

The current ARM/Thumb assembler language, called Unified Assembler Language (UAL), has superseded earlier versions of both the ARM and Thumb assembler languages (we saw a few Thumb instructions in Chapter 3, and we’ll see more throughout the book, particularly in Chapter 17). To give you some idea of the subtle changes involved, compare the two formats for performing a shift operation:

Old ARM format UAL format

MOV <Rd>, <Rn>, LSL shift LSL <Rd>, <Rn>, shift LDR{cond}SB LDRSB{cond}

LDMFD sp!,{reglist} PUSH {reglist}

Code written using UAL can be assembled for ARM, Thumb, or Thumb-2, which is an extension of the Thumb instruction set found on the more recent ARM

processors, e.g., Cortex-A8. However, you’re likely to find a great a deal of code written using the older format, so be mindful of the changes when you review older programs. Also be aware that a disassembly of your code will show the UAL nota- tions if you are using the RealView tools or Code Composer Studio. You can find more details on UAL formats in the RealView Assembler User’s Guide located in the RVMDK tools.

We’ll examine commented code throughout the book, but in general it is a good idea to document your code as much as possible, with clear statements about the operation of certain lines. Remember that on large projects, you will probably not be the only one reading your code. Guidelines for good comments include the following:

• Don’t comment the obvious. If you’re adding one to a register, don’t write

“Register r3 + 1.”

• Use concise language when describing what registers hold or how a func- tion behaves.

• Comment the sections of code where you think another programmer might have a difficult time following your reasoning. Complicated algorithms usually require a deep understanding of the code, and a bug may take days to find without adequate documentation.

• In addition to commenting individual instructions, include a short descrip- tion of functions, subroutines, or long segments of code.

• Do not abbreviate, if possible.

• Acronyms should be avoided, but this can be difficult sometimes, since peripheral register names tend to be shortened. For example, VIC0_VA7R might not mean much in a comment, so if you use the name in the instruction, describe what the register does.

TABLE 4.1

ARM Version 4T Instruction Set

ADC ADD AND B BL

BX CDP CMN CMP EOR

LDC LDM LDR LDRB LDRBT

LDRH LDRSB LDRSH LDRT MCR

MLA MOV MRC MRS MSR

MUL MVN ORR RSB RSC

SBC SMLAL SMULL STC STM

STR STRB STRBT STRH STRT

SUB SWIa SWP SWPB TEQ

TST UMLAL UMULL

a The SWI instruction was deprecated in the latest version of the ARM Architectural Reference Manual (2007c), so while you should use the SVC instruction, you may still see this instruction in some older code.

If you are using the Keil tools, the first semicolon on a line indicates the beginning of a comment, unless you have the semicolon inside of a string constant, for example, abc SETS “This is a semicolon;”

Here, a string is assigned to the variable abc, but since the semicolon lies within quotes, there is no comment on this line. The end of the line is the end of the comment, and a comment can occupy the entire line if you wish. The TI assembler will allow you to place either an asterisk (*) or a semicolon in column 1 to denote a comment, or a semicolon anywhere else on the line.

At some point, you will begin using constants in your assembly, and they are allowed in a handful of formats:

• Decimal, for example, 123

• Hexadecimal, for example, 0x3F

• n_xxx (Keil only) where:

n is a base between 2 and 9 xxx is a number in that base

Character constants consist of opening and closing single quotes, enclosing either a single character or an escaped character, using the standard C escape characters (recall that escape characters are those that act as nonprinting characters, such as \n for creat- ing a new line). String constants are contained within double quotes. The standard C escape sequences can be used within string constants, but they are done differently by assemblers. For example, in the Keil tools, you could say something like

MOV r3, #’A’ ; single character constant

GBLS str1 ; set the value of global string variable str1 SETS “Hello world!\n”

In the Code Composer Studio tools, you might say .string “Hello world!”

which places 8-bit characters in the string into a section of code, but the .string directive neither adds a NUL character at the end of the characters nor interprets escape characters. Instead, you could say

.cstring “Hello world!\n”

which both adds the NUL character for you and correctly interprets the \n escape character at the end.

Before we move into directives, we need to cover a few housekeeping rules. For the Keil tools, there are case rules associated with your commands, so while you can write the instruction mnemonics, directives, and symbolic register names in either uppercase or lowercase, you cannot mix them. For example ADD or add are accept- able, but not Add. When it comes to mnemonics, the TI assembler is case-insensitive.

To make the source file easier to read, the Keil tools allow you to split up a single line into several lines by placing a backslash character (\) at the end of a line. If you had a long string, you might write

ISR_Stack_Size EQU (UND_Stack_Size + SVC_Stack_Size + ABT_Stack_Size + \ FIQ_Stack_Size + IRQ_Stack_Size)

There must not be any other characters following the backslash, such as a space or a tab. The end-of-line sequence is treated as a white space by the assembler.

Using the Keil tools, you may have up to 4095 characters for any given line, includ- ing any extensions using backslashes. The TI tools only allow 400 characters per line—anything longer is truncated. For either tool, keep the lines relatively short for easier reading!

Structure of Assembly Language Modules

Defining a Block of Data or Code

Loads and Stores: The Instructions