Flags and Their Use

Chapter 11 Floating-Point Data-Processing Instructions

11.4 Flags and Their Use

As we saw in Chapter 2, the various Program Status Registers hold the flags and control fields for the integer instructions. Recall from Chapter 9 how the Floating- Point Status and Control Register, the FPSCR, performs the same function for the FPU. One difference between the integer handling of the flags and that of the FPU is in the operations that can set the flags. Only the two compare instructions, VCMP and VCMPE, can set the flags for the FPU. None of the arithmetic operations are capable of setting flags. In other words, there is no S variant for floating-point instructions as with integer instructions. As a result, you will see that the flags are much simpler in the FPU than their integer counterparts, however, the C and V flags are redefined to indicate one or both operands in the comparison is a NaN. The use of the V flag in integer operations to indicate a format overflow is not necessary in floating-point.

11.4.1 CoMpARison insTRuCTions

The VCMP and VCMPE instructions perform a subtraction of the second operand from the first and record the flag information, but not the result. The two instructions differ in their handling of NaNs. The VCMPE instruction will set the Invalid Operation flag if either of the operands is a NaN, while the VCMP instruction does so only when one or more operands are sNaNs. The check for NaNs is done first, and if neither operand is a NaN, the comparison is made between the two operands. As we mentioned in Chapter 9, infinities are treated in an affine sense, that is,

−infinity < all finite numbers < +infinity

which is what we would expect. If we compare a normal number and a positive infinity, we expect the comparison to show the infinity is greater than the normal number.

Likewise, a comparison of a negative infinity with any value, other than a negative infinity or a NaN, will show the negative infinity is less than the other operand.

The VCMP and VCMPE instructions may be used to compare two values or compare one value with zero. The format of the instruction is

VCMP{E}{<cond>}.F32 <Sd>, <Sm>

VCMP{E}{<cond>}.F32 <Sd>, #0.0

The VCMP instruction will set the Invalid Operand status bit (IOC) if either operand is a sNaN. The VCMPE instruction sets the IOC if either operand is a NaN, whether the NaN is signaling or quiet. The flags are set according to Table 11.2.

11.4.2 The n fLAg

The N flag is set only when the first operand is numerically smaller than the second operand. Since an overflow is recorded in the OFC status bit, there is no need for the N flag in detecting an overflow condition as in integer arithmetic.

11.4.3 The Z fLAg

The Z flag is set only when the first and second operands are not NaN and compare exactly. There is one exception to this rule, and that involves zeros. The positive zero and negative zero will compare equal. That is, when both operands are zero, the signs of the two zeros are ignored.

11.4.4 The C fLAg

The C flag is set in two cases. The first is when the first operand is equal to or larger than the second operand, and the second is when either operand is NaN.

11.4.5 The V fLAg

The V flag is set only when a comparison is unordered, that is, when a NaN is one or both of the comparison operands.

EXAMPLE 11.1

The comparisons in Table 11.3 show the operation of the Cortex-M4 compare instructions.

TABLE 11.2

Floating-Point Status Flags

Comparison Result N Z C V

Less than 1 0 0 0

Equal 0 1 1 0

Greater than 0 0 1 0

Unordered 0 0 1 1

TABLE 11.3

Example Compare Operations and Status Flag Settings

Operands Flags

Notes

Sd Sm N Z C V

0x3f800001 0x3f800000 0 0 1 0 Sd > Sm 0x3f800000 0x3f800000 0 1 1 0 Sd = = Sm 0x3f800000 0x3f800001 1 0 0 0 Sd < Sm 0xcfffffff 0x3f800000 1 0 0 0 Sd < Sm 0x7fc00000 0x3f800000 0 0 1 1 Sd is qNaN 0x40000000 0x7f800001 0 0 1 1 Sm is sNaN

11.4.6 pRediCATed insTRuCTions, oRThe useofThe fLAgs

The flags in the FPU may be accessed by a read of the FPSCR and tested in an integer register. The most common use for these flags is to enable predicated operation, as was covered in Chapter 8. Recall that the flag bits used in the determination of whether the predicate is satisfied are the flag bits in the APSR. To use the FPU flags, a VMRS instruction must be executed to move the flags in the FPSCR to the APSR.

The format of the VMRS is

VMRS{<cond>} <Rt>, FPSCR

The destination can be any ARM register, r0 to r14, but r13 and r14 are not rea- sonable choices. To replace the NZCV flag bits in the APSR the <Rt> field would contain “APSR_nzcv.” This operation transfers the FPSCR flags to the APSR, and any predicated instruction will be executed or skipped based on the FPSCR flags until these flags are changed by any of the operations covered in Chapter 7. When using the flags, the predicates are the same as those for integer operations, as seen in Chapter 8 (see Table 8.1).

EXAMPLE 11.2 Transfer the flag bits in the FPSCR to the APSR.

soLuTion

The transfer is made with a VMRS instruction, with the destination APSR_nzcv:

VMRS.F32 APSR_nzcv, FPSCR

VMRS is what is known as a serializing instruction. It must wait until all other instructions have completed and the register file is updated to ensure any instruction that could alter the flag bits has completed. Other serializing instructions include the counterpart instruction, VMSR, which overwrites the FPSCR with the contents of an ARM register. This instruction is serializing to ensure changes to the FPSCR do not affect instructions that were issued before the VMSR but have not yet completed.

To modify the FPSCR, for example, to change the rounding mode, the new value must be read from memory or the new rounding mode inserted into the current FPSCR value. To change the current FPSCR value, first move it into an ARM register, modify the ARM register, and then use the VMSR instruction to move the new value back to the FPSCR. The modification is done using the integer Boolean operations. The format for the VMSR instruction is

VMSR{<cond>} FPSCR, <Rt>

EXAMPLE 11.3 Set the rounding mode to roundTowardZero.

soLuTion

The rounding mode bits are FPSCR[22:23], and the patterns for the rounding mode selection was shown in Chapter 9. To set the rounding mode to roundTowardZero, the bits [22:23] must be set to 0b11. Modifying the FPSCR is done using integer bit manipulation instructions, but the FPSCR must first be copied to an ARM register by the VMRS instruction. The bits can be ORed in using an ORR immediate instruction, and the new FPSCR written to the FPU with the VMSR instruction. The code sequence is below.

VMRS r2, FPSCR ; copy the FPSCR to r2 ORR r2, r2, #0x00c00000 ; force bits [22:23] to 0b11 VMSR FPSCR, r2 ; copy new FPSCR to FPU After running this code, Figure 11.1 shows the register window in the Keil tools with the change in the FPSCR.

To set the rounding mode back to RN, the following code can be used:

VMRS r2, FPSCR ; copy the FPSCR to r2 BIC r2, r2, #0x00c00000 ; clear bits [22:23]

VMSR FPSCR, r2 ; copy new FPSCR to FPU EXAMPLE 11.4

Find the largest value in four FPU registers.

soLuTion

Assume registers s4, s5, s6, and s7 contain four single-precision values. The VCMP.F32 instruction performs the compares and sets the flags in the FPSCR.

These flags are moved to the APSR with the VMRS instruction targeting

FIGURE 11.1 FPSCR contents after the rounding mode change.

APSR_nzcv as the destination; the remaining bits in the APSR are unchanged.

This allows for predicated operations to be performed based on the latest floating-point comparison.

; Find the largest value in four FPU registers

; s4-s7. Use register s8 as the largest value register

; First, compare register s4 to s5, and copy the largest to s8.

; Then compare s6 to s8, and copy s6 to s8 if it is

; larger. Finally, compare s7 to s8, copying s7 to s8 if

; it is the larger.

; Set up the contents of registers s4-s7 using VLDR

; pseudo-instruction VLDR.F32 s4, = 45.78e5 VLDR.F32 s5, = -0.034 VLDR.F32 s6, = 1.25e8 VLDR.F32 s7, = -3.5e10

; The comparisons use the VCMP instruction, and the status

; bits copied to the APSR. Predicated operations perform

; the copies

; First, compare s4 and s5, and copy the largest

; to s8. The GT is true if the compare is signed >,

; and the LE is true if the compare is signed < =.

VCMP.F32 s4, s5 ; compare s4 and s5

VMRS APSR_nzcv, FPSCR ; copy only the flags to APSR VMOVGT.F32 s8, s4 ; copy s4 to s8 if larger than s5 VMOVLE.F32 s8, s5 ; copy s5 if larger or equal to s4

; Next, compare s6 with the new largest. This time only

; move s6 if s6 is greater than s8.

VCMP.F32 s6, s8 ; compare s6 and the new larger VMRS APSR_nzcv, FPSCR ; copy only the flags to APSR VMOVGT.F32 s8, s6 ; copy s6 to s8 if new largest

; Finally, compare s7 with the largest. As above, only

; move s7 if it is greater than s8.

VCMP.F32 s7, s8 ; compare s6 and the new larger VMRS APSR_nzcv, FPSCR ; copy only the flags to APSR VMOVGT.F32 s8, s7 ; copy s7 to s8 if new largest

; The largest of the 4 registers is now in register s8.

Exit B Exit

11.4.7 A WoRdABouTThe iT insTRuCTion

The IT instruction was introduced in Chapter 8. Recall that ARM instructions are predicated, with the AL (Always) predicate the default case, used when an instruction is to be executed regardless of the status bits in the APSR. When execution is to be determined by the status bits, as in the example above, a field mnemonic is appended to the instruction, as in VMOVGT seen above. This is true in the ARM instruction set, but not in the Thumb-2 instruction set—this functionality

is available through the IT instruction. In the disassembly file, the Keil assembler inserted an IT instruction before the VMOVGT and the VMOVLE instructions as shown below.

0x00000034 BFCC ITE GT

63: VMOVGT.F32 s8, s4 ; copy s4 to s8 if larger than s5 0x00000036 EEB04A42 VMOVGT.F32 s8,s4

64: VMOVLE.F32 s8, s5 ; copy s5 if larger or equal to s4 0x0000003A EEB04A62 VMOVLE.F32 s8,s5

Since the GT and LE conditions are opposites, that is, the pair covers all conditions, only a single IT block is needed.

The Keil tools allow for the programmer to write the assembly code as if the instructions are individually predicated, as in the example above. The assembler determines when an IT block is needed, and how many predicated instructions may be part of the IT block. Each IT block can predicate from one to four instructions. It is a very powerful tool and should be used when the result of a compare operation is used to select only a small number of operations.

Structure of Assembly Language Modules

Defining a Block of Data or Code