(LUẬN văn THẠC sĩ) tìm sai số làm tròn cho các chương trình java sử dụng symbolic pathfinder m a thesis information technology 60 48 01

Number representation

Floating-point numbers

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) specifies the binary format for 32-bit single-precision floating-point numbers, which consist of three components: the sign bit, the exponent, and the mantissa The sign bit indicates positivity or negativity, with 0 representing a positive number and 1 indicating a negative number The exponent, an 8-bit value, can range from -126 to 127, while the mantissa provides the normalized binary representation of the number, which is then multiplied by 2 raised to the power defined by the exponent.

Example 2.1 Consider the representation of oat number 118.625.

The number 118.625 is positive, indicated by a sign bit of 0 To convert it to binary, it is represented as 1110110.101 Normalizing this number gives us 1.110110101 x 2^6, where the exponent is 6 and the mantissa is 1.110110101 To account for bias, the exponent is adjusted to 133 by adding 127, and its binary representation is 10000101.

Table 2.1: 32 bits binary representation of oat number 118.625 Sign (1 bit) Exponent (8 bits) Mantissa (23 bits)

Thus, the oating-point encoded value of118.625is01000010111101101010000000

Binary values are often referred to in their hexadecimal equivalent In this case,the hexadecimal value is 42ED4000.

Fixed-point numbers

Fixed-point representation involves selecting a specific radix point, or decimal point, which allows for a predetermined number of bits to the left (integer bits) and right (fractional bits) of the point This format is denoted as (b, m, n), where 'b' represents the base, 'm' the number of integer bits, and 'n' the number of fractional bits In this thesis, the default fixed-point format used is (2, 11, 4), also known as binary fixed-point, which is favored for its computational efficiency.

In Example 2.2, we consider a 16-bit fractional number structured in an 8.8 format, consisting of 8 magnitude bits and 8 radix bits Similar to most signed integers, fixed-point numbers utilize two's complement binary representation, and for simplicity, we focus on a positive number in this instance.

Example 2.2 16 bits xed-point representation of 118.625

To encode the number 118.625, we first determine the integer bits, where the binary representation of 118 is 01110110, forming the upper 8 bits of a 16-bit number The fractional part, 0.625, is calculated as 0.625 × 2^n, and since 0.625 × 256 equals 160, we use the binary representation of 160, which is 10100000, for the fractional bits Consequently, the complete binary representation for 118.625 is 0111011010100000 This value is commonly expressed in its hexadecimal form as 76A0.

Table 2.2: 16 bits xed-point binary representation of 118.625

Integer part (8 bits) Fraction (8 bits)

Fixed-point representation offers a significant advantage for real numbers by aligning with the arithmetic principles of integers This compatibility allows fixed-point numbers to leverage the existing optimizations in the Arithmetic Logic Unit (ALU) of most microprocessors, eliminating the need for extra libraries or hardware logic In processors lacking a Floating Point Unit (FPU), like the Analog Devices Blackfin Processor, fixed-point representation can lead to more efficient embedded code, especially for mathematically intensive operations.

Fixed-point numbers have a significant limitation in that they can only represent a restricted range of values, making them prone to common computational inaccuracies For instance, in the 8.8 notation, the representable range is from +127.99609375 to -128.0 When performing operations like adding 100 + 100, the result can exceed this range, leading to a situation known as overflow Typically, when overflow occurs, the values are either saturated or truncated, resulting in the output being capped at the maximum representable value.

Assume we use xed-point format(2,11,4)and we have the oating-point number 1001.010111 Then the corresponding xed-point number is 1001.0101 and the round-o error is 0.000011.

In fixed-point computation, lost bits can occur in two forms: overflow errors and rounding errors This article focuses solely on rounding errors, as they are more challenging to identify.

Symbolic execution with Symbolic PathFinder

Symbolic execution

Symbolic execution, a concept introduced thirty years ago, has gained practical application in recent years and is now the foundation for several popular open-source testing tools, including NASA's SPF for Java, UIUC's CUTE and jCUTE, Stanford's KLEE, and UC Berkeley's CREST These tools are also utilized in industrial settings by major companies such as Microsoft, IBM, NASA, and Fujitsu, demonstrating the growing importance of symbolic execution in software testing.

Symbolic execution is a technique that utilizes symbolic values as input instead of actual data, allowing program variables to be represented as symbolic expressions This approach enables a program to explore all feasible paths during execution, in contrast to concrete execution In software testing, symbolic execution generates test inputs for each execution path of a program The various execution paths can be visualized using an execution tree, illustrating the different possibilities within the program's flow.

1 http://babelsh.arc.nasa.gov/trac/jpf/wiki/projects/ jpf-symbc

2 http://osl.cs.uiuc.edu/~ksen/cute/

It is unquestionable that symbolic execution has many advantages, however, it also has its own weakness:

• Path explosion: Symbolic execution does not scale well on the program with huge number of paths which grows exponentially with the number of static branches in the code.

• Constraint complexity: This is one of the main reasons that make symbolic execution fails to scale on large programs because solvers cannot nd the solution for too complex queries.

Symbolic PathFinder

Symbolic PathFinder is an extension of Java PathFinder (JPF) designed for symbolic execution of Java programs This extension integrates symbolic execution with model checking and constraint solving to automatically generate test inputs, ensuring high code coverage and effective error detection in programs with unspecified inputs.

In this thesis, an extended version ofSPF plays an important role in the rst step to generate symbolic round-o error from Java programs.

Related works

Overflow and rounding error analysis has been a critical area of study since the inception of computer science, as both fixed-point and floating-point representations present unique challenges While much research has addressed both overflow and rounding errors, this work specifically emphasizes rounding error due to its complexity However, the concepts discussed can also be applied to overflow error situations.

This article discusses three types of overflow and rounding errors: real numbers versus floating-point, real numbers versus fixed-point, and floating-point numbers versus fixed-point While previous studies have primarily concentrated on rounding errors involving real results, this article specifically addresses the rounding errors associated with floating-point and fixed-point numbers.

Ngoc and Ogawa recently developed a tool named C ANAlyzer (CANA) for analyzing overflow and rounding errors in programs This tool provides output on the rounding error ranges of variables at various points in the program and issues warnings when overflow errors are detected They introduce a novel interval, the extended affine interval (EAI), to more accurately estimate rounding error ranges, improving upon the Classical Interval (CI) and Affine Interval (AI) methods.

• Classical Interval was rst time introduced in the 1960s by Moore [11] as a method to putting bounds on round-o errors in mathematical computations CI is simple but imprecise.

• Ane Interval provides higher precision because it introduces symbolic ma- nipulations on noise symbols to handle correlations between variables which

EAI offers distinct advantages over CI and AI, primarily by eliminating the introduction of new noise symbols, resulting in a more compact representation Additionally, EAI provides greater precision than CI by effectively storing information related to uncertainty However, it still retains a level of imprecision similar to that of our approach.

In this chapter, we will begin by presenting a symbolic computation method that addresses round-off errors, drawing inspiration from previous works [13, 9] Following this, we will extend our discussion to example programs, simplifying them into a collection of arithmetic expressions with specific constraints.

Symbolic round-o error

In this context, let R represent the set of all real numbers, L denote the set of all floating-point numbers, and I signify the set of all fixed-point numbers Both L and I are finite due to their representation using a limited number of bits For practical purposes, we assume that the bit count in fixed-point format does not exceed the number of significant bits in the floating-point representation, leading to the relationship I ⊂ L ⊂ R.

In the context of real arithmetic functions, we define a function \( y = f(x_1, \ldots, x_n) \), where \( x_1, \ldots, x_n, \) and \( y \) are real numbers, and \( f \) is an arithmetic expression involving these variables For ease of understanding, we represent \( x_0 \) as the rounded value of \( x \) within a set \( L \), and \( x_{00} \) as the further rounded value of \( x_0 \) within a set \( I \).

Arithmetic operations on floating-point and fixed-point numbers can differ in precision We denote the floating-point version as \( f_l \) and the fixed-point version as \( f_i \), where real arithmetic operations are substituted with their corresponding counterparts.

Chapter 3: Symbolic round-o error operations inLandI, respectively We denote the operations in real as+,−,×,÷, in oating-point as {+ l ,− l ,× l ,÷ l } and in xed-point as{+ i ,− i ,× i ,÷ i }.

The round-o error analysis in literature usually focuses on the largest error between f and f l : x j ∈maxR ,j=1 n|f(x 1 , , x n )−f l (x 0 1 , , x 0 n )| (3.1)

In our setting we focus on the largest round-o error between f l and f i : max x 0 j ∈ L ,j=1 n|f l (x 0 1 , , x 0 n )−f i (x 00 1 , , x 00 n )| (3.2)

Alternatively, one may want to check if round-o error exceeds a given threshold θ:

In this analysis, we assume that the fixed-point function is not manually optimized for fixed-point computation We establish that the evaluations of both f l and f i are identical Additionally, we note that the scaling of variables in f i is consistent, ensuring that all values and variables adhere to the same fixed-point format Furthermore, as outlined in Chapter 2, we assume that both floating-point and fixed-point representations utilize the same base, with a specified fixed-point format as indicated by (2,11,4).

Due to the differences between floating-point and fixed-point representations, a value \( x_0 \in L \) typically requires rounding to a corresponding value \( x_{00} \in I \) This process is governed by a non-decreasing monotonic function \( r \) that maps values from \( L \) to \( I \) The conversion error, defined as \( x_0 - l \cdot r(x_0) \) (which is equivalent to \( x_0 - l \cdot x_{00} \)), is considered to be in \( L \) under the given assumptions.

In floating-point evaluations, it is crucial to monitor the error, as it plays a significant role in the accuracy of the results However, in fixed-point computations, this error accumulates differently and does not impact the evaluation in the same manner.

To effectively utilize symbolic execution for accurately representing rounding errors, it is essential to monitor all errors introduced during rounding, their propagation through arithmetic operations, and the additional errors arising from the discrepancies between arithmetic operations ×l and ×i.

To track the error, now we denote a oating-point x by (x i , x e ) where x i = r(x) and x = x− r(x) Note that x can be negative, depending on the rounding

Chapter 3 discusses symbolic round-o error methods, focusing on arithmetic operations involving symbolic round-o errors for both floating-point and fixed-point calculations These operations, denoted as +s, −s, ×s, and ÷s, are defined in a manner consistent with previous studies The core concept underlying these operations is to assess the accumulation of error that occurs throughout the computation process.

Denition 3.1 Basic symbolic round-o error

(xi, xe) +s(yi, ye) = (xi+lyi, xe+lye) (3.4)

(xi+lxe)÷l(yi+lye)−lxi÷lyi

(3.7) where re(xi, yi) = (xi×lyi)−l(xi×iyi)(resp de(xi, yi) = (xi÷lyi)−l(xi÷iyi)) are the round-o errors between oating-point and xed-point multiplication (resp. division).

When multiplying two fixed-point numbers, round-off errors may occur, necessitating the use of the round function r in the initial phase and re(xi, yi) in the subsequent phase Additionally, the definition of division ÷ includes de(xi, yi) Although addition and subtraction can lead to overflow errors, this article does not address those issues.

The accumulated error may not always increase, as shown in the below example. Example 3.2 Addition round-o error

For readability, let the xed-point format be(10,11,4)and let x= 1.312543, yChapter 3: Symbolic round-o error and y= (y i , y e ) = (2.1246,−0.000033) Apply the above denition with addition, we have:

With x, y in Example 3.2, for multiplication, we have:

In Example 3.2, the multiplication of two fixed-point numbers can lead to rounding errors, necessitating an additional value for the second part of the pair This supplementary value, similar to conversion errors, is limited by a specific range, which we will explore in the following section.

Constraints

In Definition 3.1, we define a number using two components, which allows us to create a symbolic representation for the rounding error, represented as the second component This representation is governed by specific rules that constrain the two components.

Let assume that our xed-point representation uses m bits for the integer part and n bits for the fractional part and is in binary.

The rst part x i is constrained by its representation So there exists d 1 , , d m+n such that x i m+n

The equation x e = x - l r(x) is limited by the 'unit' of the fixed-point in base b, defined as b - n, where the unit represents the absolute difference between a fixed-point number and its successor When rounding to the nearest value, this constraint is reduced to half of the unit.

Like x e , the re() part also has the same constraint and our implementation will cover it.

Symbolic round-o error for expressions

In our approach to symbolic execution, input parameters are represented by pairs of two symbols instead of a single symbol, incorporating additional constraints on these symbols.

In our exploration of arithmetic expressions, we begin by replacing variables with symbolic pairs, then applying the arithmetic expression as outlined in Definition 3.1 This process yields a pair that includes a symbolic fixed-point result and a symbolic round-off error, the latter of which is crucial for our next step in identifying global optima Before proceeding, we will first examine the extension of our symbolic round-off error specifically for Java programs.

Symbolic round-o error for programs

We redefine our problem by focusing on a mathematical function in Java, characterized by numeric input and output parameters, initial input ranges, fixed-point format, and a threshold θ Our goal is to ascertain whether any combination of input parameters leads to a discrepancy between the floating-point and fixed-point results that exceeds the specified threshold In line with prior research, we limit our analysis to mathematical functions that do not contain unknown loops, ensuring that each function has a finite number of execution paths that ultimately terminate.

Through standard symbolic execution, we can identify a result expression and its corresponding input parameter constraints for each potential execution path By applying the method outlined in section 3.1 to each of these pairs, we can integrate the path conditions to generate a symbolic rounding error for every execution path.

Figure 3.4 is the example from [13] that we will use to illustrate our approach.

In this program we use special comments to specify the xed-point format, the threshold, and the input ranges of parameters.

*/ double rst; double maintest(double x, double y) { if(x > 0) rst = x*x; else rst = 3*x; rst -= y; return rst;

Applications of symbolic round-o error

The symbolic round-off error has diverse applications beyond identifying the largest error It can be utilized to determine if a round-off error exceeds a specified threshold, relying on the capabilities of external SMT solvers due to the typically non-linear nature of the constraints involved As a mathematical function, the symbolic round-off error provides insights into various properties of the error, including the significance of different variables in contributing to the error and enabling visualization of the function Additionally, it can be employed to calculate other metrics related to round-off errors, such as the frequency or density of errors above a defined threshold, as well as the integral of the error.

In the beginning of this chapter, we will present about our tool's implementation Next will be several experimental results and lastly, some discussion will be brought up.

Implementation

Our extended Symbolic PathFinder

To automatically generate symbolic output expression for Java programs, we have to modifySPFto return symbolic output expression for each path by symbolically executing programs.

SPF has high-level overview as shown in Figure 2.2 Its inputs includes:

• A Java bytecode program (*.class le)

• JPF conguration le specifying which method should be done symbolic execution.

SPF generates a test suite that includes test inputs and sequences, along with counterexamples for any properties that fail To accomplish this, SPF employs various extension mechanisms from JPF.

• Choice Generators: state space branch points used to generate choices

• Instruction Factories: instruction set semantics which provide the ability to replace the standard execution semantics with a symbolic semantics.

• Attribute Objects: used to store and propagate symbolic information as metadata associated with concrete values and objects.

• Listeners: used for execution monitoring

SPFcurrently provides two types of listener which gather and display information about the path conditions generated during the symbolic execution.

• SymbolicListener: shows analyzed path conditions and also the test cases generated by symbolical executing includes: test inputs and their expected output in plain text or HTML format.

• SymbolicSequenceListener: shows test sequences (i.e sequences of method calls), for each path condition encountered.

Our implementation requires symbolic output expression of each path and its respective path condition so that we modied SymbolicListener class to achieve desire results.

In listener architecture, we utilize the instructionExecuted() method to capture notifications whenever a symbolic method is invoked This process allows us to generate both the path condition and the symbolic output expression corresponding to that path, triggered by the execution of a return bytecode instruction Ultimately, this information is published for further use.

Experimental results

Experiment with simple program

In the analysis of the program illustrated in Figure 3.4, we can derive its symbolic expression based on two scenarios: (x > 0 ∧ x × x − y) and (x < 0 ∧ 3 × x − y) Focusing on the first scenario, when combined with the input range of x ∈ [0, 3], we arrive at the condition x > 0 ∧ −1 ≤ x ≤ 3, which simplifies to 0 < x ≤ 3 Consequently, we aim to determine the round-off error symbolic expression for x × x − y within the constraints of 0 < x ≤ 3 and −10 ≤ y ≤ 10.

Applying the symbolic execution with the round-o error part, we get the symbolic expression for round-o error:

NMaximize[{a1 + 2*xf*xr + xr^2 - yr, Join[{xf == Sum[(Symbol["x" ToString[i]])*(2^(11 - i)), {i, 1, 15}]}, Table[0

Tiêu đề	Finding Round-Off Error For Java Programs Using Symbolic Pathfinder
Tác giả	Nguyen Bao Ngoc
Người hướng dẫn	PhD. Truong Anh Hoang, PhD. Le Trong Vinh
Trường học	Vietnam National University, Hanoi University of Engineering and Technology
Chuyên ngành	Computer Science
Thể loại	Master Thesis
Năm xuất bản	2014
Thành phố	Hanoi

Định dạng
Số trang	41
Dung lượng	840,91 KB