Analogous to the integer lookup tables in Section 12.2, floating-point lookup tables are addressed with load instructions that have offsets, only the values for most cases are single-precision floating-point numbers. Instead of an LDR instruction, we use a VLDR instruction to move data into a register, or something like
VLDR.F s2, [r1, #20] ; offset is a multiple of 4 EXAMPLE 12.2
In this example, we set up a constant table with the label ConstantTable and load this address into register r1, which will serve as an index register for the VLDR instruction. The offset may be computed as the index of the value in the table entry less one, then multiplied by 4, since each data item is 4 bytes in length.
ADR r1, ConstantTable ; Load address of
; the constant table
; load s2 with pi, s3 with 10.0,
; and multiply them to s4
VLDR.F s2, [r1, #20] ; load pi to s2 VLDR.F s3, [r1, #12] ; load 10.0 to s3 VMUL.F s4, s2, s3
loop B loop
ALIGN ConstantTable
DCD 0x3F800000 ; 1.0 DCD 0x40000000 ; 2.0 DCD 0x80000000 ; -0.0 DCD 0x41200000 ; 10.0 DCD 0x42C80000 ; 100.0 DCD 0x40490FDB ; pi DCD 0x402DF854 ; e
A common use of the index-with-offset addressing mode is with literal pools, which we encountered in Chapter 6. Literal pools are very useful in floating-point code since many floating-point data items are not candidates for the immediate constant load, which we will discuss in a moment. When the assembler creates a literal pool, it uses the PC as the index register. The Keil assembler allows for constants to be named with labels and used with their label.
EXAMPLE 12.3
The following modification to the example above shows how labels can be used in constant tables, should your assembler support this.
; load s2 with pi, s3 with 10.0,
; and multiply them to s4 VLDR.F s5, C_Pi VLDR.F s6, C_Ten VMUL.F s7, s5, s6
loop B loop
ALIGN
C_One DCD 0x3F800000 ; 1.0 C_Two DCD 0x40000000 ; 2.0 C_NZero DCD 0x80000000 ; -0.0 C_Ten DCD 0x41200000 ; 10.0 C_Hun DCD 0x42C80000 ; 100.0 C_Pi DCD 0x40490FDB ; pi C_e DCD 0x402DF854 ; e
Since the labels C_Pi and C_Ten translate to addresses, the distances between the current value of the Program Counter and the constants are calculated, then used in a PC-relative VLDR instruction. This technique allows you to place floating- point values in any order, since the tools calculate offsets for you.
EXAMPLE 12.4 RECIPROCAL SQUARE ROOT ESTIMATION CODE In graphics algorithms, the reciprocal square root is a common operation, used fre- quently in computing the normal of a vector for use in lighting and a host of other operations. The cost of doing the full-precision, floating-point calculation of a square root followed by a division can be expensive. On the Cortex-M4 with floating-point hardware, these operations take 28 cycles, which is a relatively small amount for division and square root. So this example, while not necessarily an optimal choice in all cases, demonstrates the use of a table of half-precision constants and the use of the conversion instruction. The reciprocal square root is calculated by using a conversion table for the significand and adjusting the exponent as needed.
The algorithm proceeds as follows. If we first consider the calculation of a recipro- cal square root, the equation is
1 1
1 2
x = f n
. ⋅ where x = 1.f ⋅ 2n
And we know that
1 1 2
1
1 2
.f⋅ n = .f n
⋅ Resulting in
1
1 2 1 2 2
. .
f g
n
n
⋅ = ⋅ − /
where 1.g is the table estimate.
The sequence of operations then becomes:
1. Load pointers to a table for even exponent input operands and a table for odd exponent operands. The tables take a small number of the most signifi- cant fraction bits as the index into the table.
2. The oddness of the exponent is determined by ANDing all bits except the LSB to a zero, and testing this with the TEQ instruction. If odd, the exponent is incremented by 1.
3. Divide the exponent by 2 and negate the result. A single-precision scale factor is generated from the computed exponent.
4. Extract the upper 4 bits of the fraction, and if the exponent is odd, use them to index into table RecipSQRTTableOdd for the estimate (the table estimate 1.g), and if the exponent is even, use the table RecipSQRTTableEven for the estimate.
5. Convert the table constant to a single-precision value using the VCVTB instruction, then multiply by the scale factor to get the result.
Note that this code does not check for negative values for x, or whether x is infinity or a NaN. Adding these checks is left as an exercise for the reader.
Reset_Handler
; Enable the FPU
; Code taken from ARM website
; CPACR is located at address 0xE000ED88 LDR.W r0, =0xE000ED88 ; Read CPACR
LDR r1, [r0]
; Set bits 20-23 to enable CP10 and CP11 coprocessors ORR r1, r1, #(0xF << 20)
; Write back the modified value to the CPACR
STR r1, [r0] ; wait for store to complete DSB
; Reciprocal Square Root Estimate code
; r1 holds the address to the odd table ADR r0, RecipSQRTTableOdd
; r2 holds the address to the even table ADR r1, RecipSQRTTableEven
; Compute the reciprocal square root estimate for a
; single precision value X x 2^n as
; 1/(X)^-1/2. The estimate table is stored in two
; halves, the first for odd exponents
; RecipSqrtTableOdd) and the second for
; even exponents (RecipSqrtTableEven).
VLDR.F s0, InputValue VMOV.F r2, s0
; Process the exponent first – we assume positive input MOV r3, r2 ; exp in r2, frac in r3
LSR r2, #23 ; shift the exponent for subtraction SUB r2, #127 ; subtract out the bias
AND r4, r2, #1 ; capture the lsb to r4 TEQ r4, #1 ; check for odd exponent
; Odd Exponent - add 1 before the negate and shift
; right operations
ADDEQ r2, #1 ; increment to make even
; All exponents
LSR r2, r2, #1 ; shift right by 1 to divide by 2
NEG r2, r2 ; negate
ADD r2, #127 ; add in the bias
LSL r2, #23 ; return the new exponent - the
; Extract the upper 4 fraction bits for the table lookup AND r3, #0x00780000
LSR r3, #18 ; shift so they are *2
; Select the table and the table entry based on
; the upper fraction bits
LDRHEQ r4, [r3, r0] ; index into the odd table LDRHNE r4, [r3, r1] ; index into the even table VMOV.F s3, r4 ; copy the selected half-precision VCVTB.F32.F16 s4, s3 ; convert the estimate to sp VMOV.F s5, r2 ; move the exp multiplier to s5 VMUL.F s6, s5, s4 ; compute the recip estimate
loop B loop
ALIGN InputValue
; Test values. Uncomment the value to convert
; DCD 0x42333333 ; 44.8, recip sqrt is 0.1494, odd exp
; DCD 0x41CA3D71 ; 25.28, recip sqrt is 0.19889, even exp ALIGN
RecipSQRTTableEven
DCW 0x3C00 ; 1.0000 -> 1.0000 DCW 0x3BC3 ; 1.0625 -> 0.9701 DCW 0x3B8B ; 1.1250 -> 0.9428 DCW 0x3A57 ; 1.1875 -> 0.9177 DCW 0x3B28 ; 1.2500 -> 0.8944 DCW 0x3AFC ; 1.3125 -> 0.8729 DCW 0x3AD3 ; 1.3750 -> 0.8528 DCW 0x3AAC ; 1.4375 -> 0.8340 DCW 0x3A88 ; 1.5000 -> 0.8165 DCW 0x3A66 ; 1.5625 -> 0.8000 DCW 0x3A47 ; 1.6250 -> 0.7845 DCW 0x3A29 ; 1.6875 -> 0.7698 DCW 0x3A0C ; 1.7500 -> 0.7559 DCW 0x39F1 ; 1.8125 -> 0.7428 DCW 0x39D8 ; 1.8750 -> 0.7303 DCW 0x39BF ; 1.9375 -> 0.7184 ALIGN
RecipSQRTTableOdd
DCW 0x3DA8 ; 0.5000 -> 1.4142 DCW 0x3D7C ; 0.5322 -> 1.3707 DCW 0x3D55 ; 0.5625 -> 1.3333 DCW 0x3D31 ; 0.5938 -> 1.2978 DCW 0x3D0F ; 0.6250 -> 1.2649 DCW 0x3CF0 ; 0.6563 -> 1.2344 DCW 0x3CD3 ; 0.6875 -> 1.2060 DCW 0x3CB8 ; 0.7186 -> 1.1795 DCW 0x3C9E ; 0.7500 -> 1.1547
DCW 0x3C87 ; 0.7813 -> 1.1313 DCW 0x3C70 ; 0.8125 -> 1.1094 DCW 0x3C5B ; 0.8438 -> 1.0886 DCW 0x3C47 ; 0.8750 -> 1.0690 DCW 0x3C34 ; 0.9063 -> 1.0504 DCW 0x3C22 ; 0.9375 -> 1.0328 DCW 0x3C10 ; 0.9688 -> 1.0160