Chapter 11 Floating-Point Data-Processing Instructions
11.8 Putting It All Together: A Coding Example
In this section, we’ll tie everything together by coding a routine for the bisection algorithm, which is a simple method of finding a root (a zero crossing) of a con- tinuous function. The algorithm begins with two points on the function that have opposite signs and computes a third point halfway between the two points. The new third point replaces one of the original points that has the same sign as the function evaluated at that new third point, and the algorithm repeats. In Figure 11.2, the origi- nal points are labeled a and b, and we see that f(a) and f(b) have opposite signs. The computed third point, c, is the result of one iteration. Further iteration will result in a computed new point closer to the true crossing. The algorithm is ended when the computed point is exactly on the zero crossing (f(c) = 0) or the difference between the input point with the same sign as the computed point is below a threshold.
The algorithm is written in pseudo-code as shown below.*
INPUT: Function f, endpoint values a, b, tolerance TOL, maximum iterations NMAX CONDITIONS: a < b, either f(a) < 0 and f(b) > 0 or f(a) > 0 and f(b) < 0
OUTPUT: value which differs from a root of f(x) = 0 by less than TOL N ← 1
While N ≤NMAX {limit iterations to prevent infinite loop c ← (a + b)/2 new midpoint
* The code is taken from the Wikipedia entry on “Bisection method,” taken from http://en.wikipedia.
org/wiki/Bisection_method#CITEREFBurdenFaires1985.
TABLE 11.11
Floating-Point Square Root Instruction Operand and Exception Table
Operand A Result
Possible
Exceptions Notes
+Normal +Normal IXC
−Normal Default NaN IOC Any input below zero results in an invalid operation +Subnormal +Normal IXC
−Subnormal Default NaN IOC Any input below zero results in an invalid operation +Infinity +Infinity None
−Infinity Default NaN IOC
+Zero +Zero None
−Zero −Zero None
NaN NaN IOC If a signaling NaN, IOC is set. NaN should be input NaN
If (f(c) = 0 or (b – a)/2 < TOL then {solution found Output(c)
Stop }
N ← N + 1 increment step counter
If sign(f(c)) = sign(f(a)) then a ← c else b ← c new interval }
Output(“Method failed.”) max number of steps exceeded
; Bisection code
; The algorithm requires the first two points,
; a, and b, to be one below and one above the
; root; that is, f(a) will have an opposite
; sign to f(b). The algorithm computes the midway
; point between a and b, called c, and computes f(c).
; This new point replaces the point with the same
; function result sign. If f(a) is positive and f(c)
; is positive, c replaces a, and the algorithm reruns
; with b and c as points. The algorithm exits when f(c)
; is zero, or (a-b) is less than a threshold value.
; FPU registers
; s6 - the threshold value, 0.0002
; s7 - 2.0, to divide the sum for the new operand
; s8 - operand a
; s9 - operand b
; s10 - the new point, operand c
; s11 - f(a)
; s12 - f(b)
; s13 - f(c)
; ARM registers
; r1 - sign of f(a)
; r2 - sign of f(b)
; r3 - sign of f(c)
; r4 - iteration count
f(x) f(b)
f(c)
f(a)
a c b x
FIGURE 11.2 Bisection method for two initial points and a computed third point.
; Choose 0 and 4 as initial samples
; Initialize the divisor register and threshold VMOV.F32 s7, #2.0
VLDR.F32 s6, = 0.0002
; Initialize the operand registers
VSUB.F32 s8, s7, s7 ; a lazy way to create 0.0 VMOV.F32 s9, #4.0
; Initialize our loop counter MOV r4, #0
Loop
; Increment our loop counter ADD r4, r4, #1
; Test a and b for (a-b) < threshold
; Use s11 for the difference, we will overwrite
; it in the eval of operand a
VSUB.F32 s11, s8, s9 ; compute the difference VABS.F32 s11, s11 ; make sure diff is positive VCMP.F32 s11, s6 ; test diff > threshold?
VMRS.F32 APSR_nzcv, FPSCR ; copy status bits to APSR BLS Exit ; if neg or eq, exit
; Evaluate the function for operand a VMOV.F32 s1, s8
BL func VMOV.F32 s11, s0
; Evaluate the function for operand b VMOV.F32 s1, s9
BL func VMOV.F32 s12, s0
; Compute the midpoint for operand c,
; the point halfway between operands
; a and b
VADD.F32 s10, s8, s9 VDIV.F32 s10, s10, s7
; Evaluate the function for operand c VMOV.F32 s1, s10
BL func VMOV.F32 s13, s0
; Test the signs of the three operands
VCMP.F32 s11, #0 ; set status bits only on operand a VMRS.F32 r1, FPSCR
AND r1, r1, #0x80000000 ; isolate the N status bit VCMP.F32 s12, #0 ; set status bits only on operand b VMRS.F32 r2, FPSCR
AND r2, r2, #0x80000000 ; isolate the N status bit VCMP.F32 s13, #0 ; set status bits only on operand c VMRS.F32 r3, FPSCR
TST r3, #0x4000000 ; test for zero
BEQ Exit ; the value in s10 is exactly
; the root
AND r3, r3, #0x80000000 ; isolate the N status bit
; If sign(a) ! = sign(c), copy s10 into s9;
; else sign(b) ! = sign(c), copy s10 into s8;
EORS r1, r3 ; test if sign(a) = sign(c) VMOVEQ.F32 s8, s10 ; if 0, copy c to a
BLEQ Loop ; run it again with a new a VMOV.F32 s9, s10 ; if not a, then copy c into b BL Loop ; run it again with a new b Exit
B Exit
; Test functions
; Assumes ATPCS - regs s0-s15 parameters and/or scratch
; Register usage:
; s0 - return result
; s1 - input operand
; s2 - scratch
; s3 - scratch func
; Function - x^3 + 2x - 8
VMOV.F32 s0, #2.0 ; use s0 to hold 2.0 temporarily VMUL.F32 s2, s1, s1 ; initial squaring of input VMUL.F32 s3, s1, s0 ; multiply input by 2
VMOV.F32 s0, #8.0 ; use s0 to hold 8.0 temporarily VMUL.F32 s2, s2, s1 ; finish cubing of input
VSUB.F32 s3, s3, s0 ; subtract off 8.0 from 2x VADD.F32 s0, s2, s3 ; add in x^3 to return reg
BX lr ; return