1. Trang chủ
  2. » Khoa Học Tự Nhiên

Matrix methods in data mining and pattern recognition

234 4 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Matrix Methods in Data Mining and Pattern Recognition
Tác giả Lars Eldọn
Người hướng dẫn Nicholas J. Higham, Editor-in-Chief
Trường học Linköping University
Thể loại book
Năm xuất bản 2007
Thành phố Philadelphia
Định dạng
Số trang 234
Dung lượng 18,18 MB

Cấu trúc

  • 1.1 Data Mining and Pattern Recognition (13)
  • 1.2 Vectors and Matrices (14)
  • 1.3 Purpose of the Book (17)
  • 1.4 Programming Environments (18)
  • 1.5 Floating Point Computations (18)
  • 1.6 Notation and Conventions (21)
  • 2.1 Matrix-Vector Multiplication (23)
  • 2.2 Matrix-Matrix Multiplication (25)
  • 2.3 Inner Product and Vector Norms (27)
  • 2.4 Matrix Norms (28)
  • 2.5 Linear Independence: Bases (30)
  • 2.6 The Rank of a Matrix (31)
  • 3.1 LU Decomposition (33)
  • 3.2 Symmetric, Positive Definite Matrices (35)
  • 3.3 Perturbation Theory and Condition Number (36)
  • 3.4 Rounding Errors in Gaussian Elimination (37)
  • 3.5 Banded Matrices (39)
  • 3.6 The Least Squares Problem (41)
  • 4.1 Orthogonal Vectors and Matrices (48)
  • 4.2 Elementary Orthogonal Matrices (50)
  • 4.3 Number of Floating Point Operations (55)
  • 4.4 Orthogonal Transformations in Floating Point Arithmetic (56)
  • 5.1 Orthogonal Transformation to Triangular Form (57)
  • 5.2 Solving the Least Squares Problem (61)
  • 5.3 Computing or Not Computing Q (62)
  • 5.4 Flop Count for QR Factorization (63)
  • 5.5 Error in the Solution of the Least Squares Problem (63)
  • 5.6 Updating the Solution of a Least Squares Problem (64)
  • 6.1 The Decomposition (67)
  • 6.2 Fundamental Subspaces (71)
  • 6.3 Matrix Approximation (73)
  • 6.4 Principal Component Analysis (76)
  • 6.5 Solving Least Squares Problems (76)
  • 6.6 Condition Number and Perturbation Theory for the Least Squares (79)
  • 6.7 Rank-Deficient and Underdetermined Systems (80)
  • 6.8 Computing the SVD (82)
  • 6.9 Complete Orthogonal Decomposition (82)
  • 7.1 Truncated SVD: Principal Component Regression (87)
  • 7.2 A Krylov Subspace Method (90)
  • 8.1 Introduction (101)
  • 8.2 Basic Tensor Concepts (102)
  • 8.3 A Tensor SVD (104)
  • 8.4 Approximating a Tensor by HOSVD (106)
  • 9.1 The k-Means Algorithm (112)
  • 9.2 Nonnegative Matrix Factorization (116)
  • 10.1 Handwritten Digits and a Simple Algorithm (123)
  • 10.2 Classification Using SVD Bases (125)
  • 10.3 Tangent Distance (132)
  • 11.1 Preprocessing the Documents and Queries (140)
  • 11.2 The Vector Space Model (141)
  • 11.3 Latent Semantic Indexing (145)
  • 11.4 Clustering (149)
  • 11.5 Nonnegative Matrix Factorization (151)
  • 11.6 LGK Bidiagonalization (152)
  • 11.7 Average Performance (155)
  • 12.1 Pagerank (157)
  • 12.2 Random Walk and Markov Chains (160)
  • 12.3 The Power Method for Pagerank Computation (164)
  • 12.4 HITS (169)
  • 13.1 Saliency Score (171)
  • 13.2 Key Sentence Extraction from a Rank-k Approximation (175)
  • 14.1 Tensor Representation (179)
  • 14.2 Face Recognition (182)
  • 14.3 Face Recognition with HOSVD Compression (185)
  • 15.1 Perturbation Theory (190)
  • 15.2 The Power Method and Inverse Iteration (195)
  • 15.3 Similarity Reduction to Tridiagonal Form (197)
  • 15.4 The QR Algorithm for a Symmetric Tridiagonal Matrix (199)
  • 15.5 Computing the SVD (0)
  • 15.6 The Nonsymmetric Eigenvalue Problem (0)
  • 15.7 Sparse Matrices (0)
  • 15.8 The Arnoldi and Lanczos Methods (0)
  • 15.9 Software (0)

Nội dung

Data Mining and Pattern Recognition

In today's digital age, vast quantities of data are collected and stored in computers, enabling the extraction of valuable information later on Often, the specific data needs are unknown at the time of collection, resulting in databases that are largely unstructured The process of extracting useful insights from these large data sets is commonly known as "data mining," often accompanied by the term "knowledge discovery."

Pattern recognition, while frequently viewed as distinct from data mining, shares a closely related definition: it involves processing raw data to make decisions based on identified patterns In this book, we will focus on the similarities between these concepts rather than their differences.

There are numerous application areas for data mining, ranging from e-business

[10, 69] to bioinformatics [6], from scientific applications such as the classification of volcanos on Venus [21] to information retrieval [3] and Internet search engines [11].

Data mining is an interdisciplinary field that integrates techniques from computer science, statistics, data analysis, linear algebra, and optimization Its eclectic approach reflects the practical significance of its applications, leading to a wealth of literature, including numerous books and surveys dedicated to the subject.

Everyday life is intricately linked to advanced mathematical methods for data mining, often without our awareness Essential techniques like linear algebra and data analysis form the foundation of many data mining strategies This book serves as an introduction to the mathematical and numerical methods utilized in data mining and pattern recognition.

Vectors and Matrices

The following examples illustrate the use of vectors and matrices in data mining.

These examples present the main data mining areas discussed in the book, and they will be described in more detail in Part II.

In many applications a matrix is just a rectangular array of data, and the elements are scalar, real numbers:

To treat the data by mathematical methods, some mathematical structure must be added In the simplest case, the columns of the matrix are considered as vectors inR m

Example 1.1 Term-document matrices are used in information retrieval Con- sider the following selection of five documents 1 Key words, which we call terms, are marked in boldface 2

Document 1: TheGoogle TM matrix P is a model of theInternet.

Document 2: P ij is nonzero if there is alinkfromWeb page j toi.

Document 3: TheGoogle matrix is used torankallWeb pages.

Document 4: Therankingis done by solving amatrix eigenvalue problem.

Document 5: Englanddropped out of the top 10 in theFIFA ranking.

If we count the frequency of terms in each document we get the following result:

Term Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 eigenvalue 0 0 0 1 0

Internet 1 0 0 0 0 link 0 1 0 0 0 matrix 1 0 1 1 0 page 0 1 1 0 0 rank 0 0 1 1 1

In Document 5, FIFA stands for the Fédération Internationale de Football Association, highlighting its focus on football (soccer) The document features a newspaper headline from 2005, noting that after the 2006 World Cup, England re-entered the top 10 rankings.

To streamline the example, we have excluded certain words typically regarded as keywords It's important to note that only the root form of a word carries significance; for instance, "ranking" is treated identically to "rank."

Thus each document is represented by a vector, or a point, in R 10 , and we can organize all documents into a term-document matrix:

Now assume that we want to find all documents that are relevant to the query

“rankingof Web pages.” This is represented by aquery vector, constructed in a way analogous to the term-document matrix: q ⎛

The query is treated as a document, transforming the information retrieval task into a mathematical problem of identifying the columns of matrix A that are near the vector q To address this challenge, it is essential to employ a suitable distance measure in R^10.

In information retrieval applications, it is common to encounter large dimensions, often on the order of 10^6 Since most documents contain only a small subset of terms, the majority of the elements in the matrix are zero, resulting in what is known as a sparse matrix.

Some methods for information retrieval use linear algebra techniques (e.g., sin- gular value decomposition (SVD)) for data compression and retrieval enhancement.

Vector space methods for information retrieval are presented in Chapter 11.

Often it is useful to consider the matrix not just as an array of numbers, or as a set of vectors, but also as a linear operator Denote the columns ofA a ã j ⎛

Then the linear transformation is defined y=Ax a ã 1 a ã 2 a ã n

The classification of handwritten digits serves as a fundamental problem in pattern recognition, where digits are represented as vectors Each digit is depicted as a 16×16 matrix of grayscale values, which can be transformed into a vector in R^256 by stacking the matrix's columns This approach allows for efficient representation and analysis of a collection of n digits.

Handwritten representations of the digit '3' can be organized into a matrix A ∈ R 256 × n, where its columns define a subspace within R 256 To approximate the basis of this subspace, we utilize Singular Value Decomposition (SVD), expressed as A = UΣV^T Figure 1.1 demonstrates three basis vectors that characterize the "3-subspace."

Figure 1.1 Handwritten digits from the U.S Postal Service database [47], and basis vectors for 3’s (bottom).

Let b be a vector representing an unknown digit We now want to classify

To determine if a given digit, represented as 'b', is a 3, we can utilize a set of approximate basis vectors, denoted as u1, u2, , uk By checking for a linear combination of these basis vectors, expressed as Σ(k, j=1) xj uj, we can ascertain if the difference between 'b' and this linear combination equals zero This method allows for an automated evaluation of the unknown digit within the range of 0 to 9.

Purpose of the Book

is small Thus, here we compute the coordinates ofb in the basis{u j } k j=1.

In Chapter 10 we discuss methods for classification of handwritten digits.

Data mining involves extracting valuable insights from extensive and often unstructured datasets To achieve this, it is essential to employ efficient methods specifically tailored for handling large-scale problems, as these applications frequently involve massive matrices.

Search engines are responsible for extracting information from all available web pages on the Internet At the heart of Google's search engine lies a complex matrix computation, likely the largest routine computation performed today.

The Google matrix P, which consists of billions of elements, reflects the vast number of web pages available on the Internet This matrix is created by analyzing the link structure of the web, where each element P_ij is nonzero if there exists a link from web page j to page i.

The following small link graph illustrates a set of Web pages with outlinks and inlinks:

A link graph matrix is created where both the columns and rows represent web pages, with the nonzero elements in column j indicating the outlinks from web page j.

For a search engine to be useful, it must use a measure of quality of the Web pages.

The Google matrix is used to rank all the pages The ranking is done by solving an eigenvalue problem forP; see Chapter 12.

This book serves as an application-oriented introduction to modern linear algebra techniques, focusing on data mining and pattern recognition rather than functioning as a traditional textbook It emphasizes the importance of accessible programming environments, such as MATLAB, to implement the discussed algorithms The content provides essential mathematical theory and numerical background, enabling readers to effectively utilize the powerful software tools included in these packages.

For a more comprehensive presentation of numerical and algorithmic aspects of the matrix decompositions used in this book, see any of the recent textbooks

The article extensively covers the resolution of linear systems and eigenvalue problems specifically for large and sparse systems, as detailed in references [4, 5] Additionally, individuals interested in the comprehensive implementation of numerical linear algebra algorithms can access free software written in Fortran, C, and C++ online [1].

This article is designed for readers who have a foundational understanding of linear algebra and scientific computing, particularly in numerical analysis A basic knowledge of a matrix-oriented programming language, such as MATLAB, will enhance the reader's ability to engage with the content effectively.

Programming Environments

In this book we use MATLAB [68] to demonstrate the concepts and the algorithms.

Our codes serve as a demonstration of fundamental principles rather than functioning as software We prioritize simplicity over efficiency and robustness, making these codes suitable only for small experiments and not for production computations.

Even if we are using MATLAB, we want to emphasize that any program- ming environment that implements modern matrix computations can be used, e.g.,

Floating Point Computations

The execution times of various algorithms can be assessed by counting the number of floating point operations, which are arithmetic operations involving floating point numbers In this book, we adhere to the standard procedure of counting each operation individually, referring to one operation as a "flop." For example, the expression y = y + a * x, where the variables are scalars, is considered to involve two flops.

It is customary to count only the highest-order term(s) We emphasize that

Flop counts can be simplistic indicators of efficiency and computational time, and they may lead to misconceptions in specific situations On contemporary computers, where memory hierarchies are standard, the patterns of data access play a crucial role in performance.

Thus there are situations in which the execution times of algorithms with the same

flop counts can vary by an order of magnitude.

While error analysis of algorithms is not a primary focus of this book, we will reference some key results without providing proofs We will operate under the assumption that computations adhere to the IEEE floating point standard, thereby validating the proposed model.

In a floating point system, a real number \( x \) cannot be precisely represented The floating point representation of \( x \), denoted as \( f l[x] \), can be expressed as \( f l[x] = x(1 + \epsilon) \), where \( \epsilon \) is a small value that meets the condition \( |\epsilon| \leq \mu \), with \( \mu \) being the unit round-off of the system Consequently, the relative error in the floating point representation of any real number \( x \) is given by the formula \( \frac{f l[x] - x}{x} \).

In IEEE double precision arithmetic (which is the standard floating point format in MATLAB), the unit round-off satisfies μ≈10 − 16 In IEEE single precision we haveμ≈10 − 7

Letf l[xy] be the result of a floating point arithmetic operation, where denotes any of +,−,∗, and/ Then, provided thatxy= 0, xy−f l[xy] xy

≤μ (1.2) or, equivalently, f l[xy] = (xy)(1 + ) (1.3) for some , satisfying || ≤ μ, where μ is the unit round-off of the floating point system.

In floating point arithmetic, the error in computation can be viewed as a forward error This concept can be expressed by rewriting the equation as f l[xy] = (x + e)(y + f), where e and f represent small perturbations in the values of x and y, respectively.

In other words, f l[xy] is theexact result of the operation on slightly perturbed data This is an example ofbackward error analysis.

In IEEE double precision, the smallest positive real number that can be represented is approximately 10^−308, while the largest is around 10^308, with similar values applicable for negative numbers When performing computations, if the result yields a floating-point number with a magnitude of v, it is essential to consider these limits for accurate representation.

Figure 1.2 Vectors in the GJK algorithm. smaller than 10 − 308 , then a floating point exception calledunderflow occurs Sim- ilarly, the computation of a floating point number of magnitude larger than 10 308 results inoverflow.

Detecting collisions between three-dimensional objects is a fundamental challenge in computer graphics, particularly in the realms of video games, animation, and simulation.

In the past, fixed point arithmetic was utilized for computer graphics, but modern computations typically employ floating point arithmetic A key challenge in this field is determining the point on a convex shape that is nearest to the origin This can be effectively addressed using the Gilbert–Johnson–Keerthi (GJK) algorithm, which operates iteratively and incorporates a specific stopping criterion.

The function S(v, w) = vT v - vT w is constrained by the inequality S(v, w) ≤ 2 during iterations, with vectors depicted in Figure 1.2 As the solution is approached, these vectors become increasingly similar Numerical challenges can arise when computing S(v, w) using floating-point arithmetic, as detailed in [101, pp 142–145] This article briefly explains the computation when v and w are scalars, represented as s = v² - vw, which encounters the same issues as the vector case.

The data may be imprecise due to representation errors from prior computations, expressed as ¯v = v(1 + v) and ¯w = w(1 + w), where v and w are typically small, often on the order of magnitude of μ.

(1.2) we see that each arithmetic operation incurs a relative error (1.3), so that f l[v 2 −vw] = (v 2 (1 + v ) 2 (1 + 1)−vw(1 + v )(1 + w )(1 + 2))(1 + 3)

Notation and Conventions

where we have assumed that| i | ≤μ The relative error in the computed quantity can be estimated by f l[v 2 −vw]−(v 2 −vw)

We see that if v andw are large, and close, then the relative error may be large.

For instance, withv= 100 andw= 99.999 we get f l[v 2 −vw]−(v 2 −vw)

In IEEE single precision, often used in computer graphics, the relative error inf l[v 2 − vw] can become significantly large, potentially preventing the termination criterion from being met and causing the iteration to continue indefinitely Additionally, the GJK algorithm encounters other scenarios that may lead to similar issues.

floating point rounding errors can cause the termination criterion to be unreliable, and special care must be taken; see [101].

Cancellation occurs when subtracting two nearly equal numbers with errors, resulting in fewer significant digits and a larger relative error For more information on the IEEE standard and rounding errors in floating-point computations, refer to [34, Chapter 2] Additionally, comprehensive analyses of rounding errors in linear algebra algorithms can be found in [50].

We will consider vectors and matrices with real components Usually vectors will be denoted by lowercase italic Roman letters and matrices by uppercase italic Roman or Greek letters: x∈R n , A= (a ij )∈R m × n

Tensors, i.e., arrays of real numbers with three or more indices, will be denoted by a calligraphic font For example,

We will useR m to denote the vector space of dimensionm over the real field and

R m × n for the space ofm×nmatrices.

, where the 1 is in position i, is used for the “canonical” unit vectors Often the dimension is apparent from the context.

The identity matrix is denoted I Sometimes we emphasize the dimension and use I k for the k×k identity matrix The notation diag(d 1 , , d n ) denotes a diagonal matrix For instance,I= diag(1,1, ,1).

We will assume that the basic notions of linear algebra are known to the reader.

For completeness, some will be recapitulated here.

Matrix-Vector Multiplication

The definition of basic operations in linear algebra significantly shapes our understanding of abstract concepts It's common to assume that these operations must follow a specific order, but the definitions themselves do not impose any such restrictions For instance, in the context of matrix-vector multiplication, if A is an m×n matrix, the resulting vector y can be expressed as y = Ax, where each component y_i is calculated as the sum of the products of the corresponding elements from the matrix and the vector, specifically y_i = Σ (a_ij * x_j) for i = 1, , m.

Symbolically one can illustrate the definition

The computation of the various components of the vector is entirely independent and can be performed in any sequence Nevertheless, the definition might suggest that the matrix should be accessed in a row-wise manner, as demonstrated in the provided MATLAB code and illustration.

In modern computers with memory hierarchies, the sequence of operations significantly impacts performance However, this discussion will not delve into that aspect.

Alternatively, we can write the operation in the following way Leta ã j be a column vector ofA Then we can write y=Ax a ã 1 a ã 2 ã ã ã a ã n

This can be illustrated symbolically:

Here the vectors are accessed columnwise In MATLAB, this version can be written 4 for i=1:m y(i)=0; end for j=1:n for i=1:m y(i)=y(i)+A(i,j)*x(j); end end or, equivalently, using the vector operations of MATLAB, y(1:m)=0; for j=1:n y(1:m)=y(1:m)+A(1:m,j)*x(j); end

The two methods for executing matrix-vector multiplication involve altering the order of the loops in the code This approach highlights the perspective of the column vectors of matrix A as basis vectors, while the components of vector x are viewed as coordinates relative to this basis.

4 In the terminology of LAPACK [1] this is the SAXPY version of matrix-vector multiplication.

SAXPY is an acronym from the Basic Linear Algebra Subroutine (BLAS) library.

Matrix-Matrix Multiplication

Matrix multiplication can be done in several ways, each representing a different access pattern for the matrices Let A∈ R m × k and B ∈R k × n The definition of matrix multiplication is

R m × n C=AB= (c ij ), c ij k s=1 a is b sj , i= 1, , m, j= 1, , n (2.4)

In a comparison to the definition of matrix-vector multiplication (2.1), we see that in matrix multiplicationeach column vector in B is multiplied byA.

We can formulate (2.4) as a matrix multiplication code for i=1:m for j=1:n for s=1:k

This is an inner product version of matrix multiplication, which is emphasized in the following equivalent code: for i=1:m for j=1:n

It is immediately seen that the the loop variables can be permuted in 3! = 6 different ways, and we can write ageneric matrix multiplication code: for for for

A column-oriented (or SAXPY) version is given in for j=1:n for s=1:k

The matrixAis accessed by columns andB by scalars This access pattern can be illustrated as

In another permutation we let thes-loop be the outermost: for s=1:k for j=1:n

This can be illustrated as follows Leta ã k denote the column vectors ofA and let b T k ã denote the row vectors ofB Then matrix multiplication can be written as

The outer product is a specific form of matrix multiplication that adheres to the conventional definition When considering two column vectors, x and y, from R^m and R^n, respectively, their outer product is represented as xy^T.

The expression for the matrix C = AB can be represented in outer product form, indicating an expansion of C using simpler matrices a, b, and their transposes It is important to note that these matrices possess a rank of 1.

Inner Product and Vector Norms

2.3 Inner Product and Vector Norms

In this section we will discuss briefly how to measure the “size” of a vector The most common vector norms are x 1 n i=1

|x i |, 1-norm, x 2 n i=1 x 2 i , Euclidean norm (2-norm), x ∞ = max

The Euclidean vector norm is the generalization of the standard Euclidean distance inR 3 toR n All three norms defined here are special cases of thep-norm: x p n i=1

Associated with the Euclidean vector norm is theinner product between two vectors xandyin R n , which is defined

Generally, avector norm is a mappingR n →Rwith the properties x ≥0 for allx, x= 0 if and only ifx= 0, αx=|α| x,α∈R, x+y ≤ x+y, the triangle inequality.

In the context of vector approximations, norms play a crucial role in defining continuity and error For a given vector \( x \), let \( \bar{x} \) represent its approximation The absolute error is defined as \( \delta x = \bar{x} - x \), while the relative error, assuming \( x \neq 0 \), is expressed as \( \delta x / x = (\bar{x} - x) / x \).

In a finite-dimensional vector space, all vector norms are equivalent, meaning that for any two norms, α and β, there exist constants m and M such that m times the norm α of a vector x is less than or equal to the norm β of x, which is also less than or equal to M times the norm α of x This relationship holds true regardless of the specific vector x For instance, in the case of x in R^n, the inequality x₂ ≤ x₁ ≤ √n x₂ demonstrates this equivalence among norms.

This equivalence implies that if a sequence of vectors (x i ) ∞ i=1 converges tox ∗ in one norm, lim i →∞ x i −x ∗ = 0, then it converges to the same limit in all norms.

In data mining applications it is common to use thecosine of the anglebetween two vectors as a distance measure: cosθ(x, y) = x T y x 2 y 2

With this measure two vectors are close if the cosine is close to one Similarly, x andy areorthogonal if the angle between them isπ/2, i.e.,x T y= 0.

Matrix Norms

For any vector norm we can define a correspondingoperator norm Let ã be a vector norm The correspondingmatrix norm is defined as

One can show that such a matrix norm satisfies (forα∈R)

For a matrix norm defined as above the following fundamental inequalities hold.

Proposition 2.1 Let ã denote a vector norm and the corresponding matrix norm Then

AB ≤ A B. Proof From the definition we have

2.4 Matrix Norms 19 for allx = 0, which gives the first inequality The second is proved by using the

One can show that the 2-norm satisfies

The computation of A² for a medium or large matrix can be quite intensive, as it involves calculating the square root of the largest eigenvalue of the matrix A^T A In contrast, determining the matrix infinity norm is a significantly simpler process.

In Section 6.1 we will see that the 2-norm of a matrix has an explicit expression in terms of the singular values ofA.

LetA∈R m × n In some cases we will treat the matrix not as a linear operator but rather as a point in a space of dimension mn, i.e.,R mn Then we can use the

Frobenius matrix norm, which is defined by

Sometimes it is practical to write the Frobenius norm in the equivalent form

A 2 F = tr(A T A), (2.8) where thetrace of a matrixB ∈R n × n is the sum of its diagonal elements, tr(B) n i=1 b ii

The Frobenius norm, while not an operator norm like vector norms, offers a computational advantage over the 2-norm It is closely related to the Euclidean vector norm, as it functions as the Euclidean norm for matrices in the linear space R m × n, where matrices are treated as elements of R mn.

Linear Independence: Bases

Given a set of vectors (v j ) n j=1 inR m ,m≥n, consider the set of linear combinations span(v 1 , v 2 , , v n ) y|y n j=1 α j v j for arbitrary coefficients α j The vectors (v j ) n j=1 are called linearly independent when n j=1 α j v j = 0 if and only if α j = 0 forj = 1,2, , n.

A set ofm linearly independent vectors inR m is called abasis inR m : any vector inR m can be expressed as a linear combination of the basis vectors.

Proposition 2.2 Assume that the vectors (v j ) n j=1 are linearly dependent Then somev k can be written as linear combinations of the rest, v k j =k β j v j Proof There exist coefficients α j with some α k = 0 such that n j=1 α j v j = 0.

−α j v j , which is the same as v k j =k β j v j withβ j =−α j /α k

If we have a set of linearly dependent vectors, then we can keep a linearly independent subset and express the rest in terms of the linearly independent ones.

The number of linearly independent vectors in a set serves as a measure of its information content, allowing us to compress the set by using these vectors as basis representatives and calculating the coordinates of the remaining vectors accordingly However, in practical applications, we often encounter almost linearly dependent vectors rather than strictly linearly dependent ones For effective and numerically stable data reduction, it is essential that the basis vectors are not only linearly independent but also orthogonal Further exploration of this topic will be provided in Chapter 4.

The Rank of a Matrix

The rank of a matrix is defined as the maximum number of linearly independent column vectors In linear algebra, it is established that the number of linearly independent column vectors is equal to the number of linearly independent row vectors.

We will see later that any matrix can be represented as an expansion of rank-1 matrices.

Proposition 2.3 An outer product matrixxy T , wherexandyare vectors in R n , has rank 1.

Thus, all the columns (rows) ofxy T are linearly dependent.

A square matrix A ∈ R n × n with rank n is called nonsingular and has an inverseA − 1 satisfying

If we multiply linearly independent vectors by a nonsingular matrix, then the vectors remain linearly independent.

Proposition 2.4 Assume that the vectors v 1 , , v p are linearly independent.

Then for any nonsingular matrix T, the vectors T v 1 , , T v p are linearly indepen- dent.

Proof Obviously p j=1 α j v j = 0 if and only if p j=1 α j T v j = 0 (since we can multiply any of the equations byT or T − 1 ) Therefore the statement follows.

In this chapter we briefly review some facts about the solution of linear systems of equations,

Ax=b, (3.1) whereA∈R n × n is square and nonsingular The linear system (3.1) can be solved usingGaussian elimination withpartial pivoting, which is equivalent to factorizing the matrix as a product of triangular matrices.

We will also consider overdetermined linear systems, where the matrix A ∈

R m × n isrectangular withm > n, and their solution using the least squares method.

This article provides background results on matrix decompositions used for solving linear systems of equations, primarily presenting them without proofs For a comprehensive understanding of the theory behind these decompositions, readers are referred to sources such as [42, 92].

Before discussing matrix decompositions, we state the basic result concerning conditions for the existence of a unique solution of (3.1).

Proposition 3.1 Let A∈R n × n and assume thatA is nonsingular Then for any right-hand-sideb, the linear systemAx=b has a unique solution.

Proof The result is an immediate consequence of the fact that the column vectors of a nonsingular matrix are linearly independent.

LU Decomposition

Gaussian elimination is effectively represented through Gauss transformations, which are crucial for understanding the relationship between Gaussian elimination and LU decomposition For a comprehensive overview of Gauss transformations, refer to standard textbooks on numerical linear algebra In the process of Gaussian elimination with partial pivoting, row reordering is performed to enhance numerical stability.

23 permutation matrices, which are identity matrices with the rows reordered; see, e.g., [42, Section 3.4.1].

In the initial phase of Gaussian elimination with partial pivoting for an n×n matrix A, we rearrange the rows to position the largest absolute value in the first column at the (1,1) position This reordering can be represented by multiplying A on the left by a permutation matrix P1 Subsequently, we proceed with the elimination process, which involves zeroing out the elements in the first column below the diagonal by applying further matrix operations.

The result of the first step of Gaussian elimination with partial pivoting is

The Gaussian elimination algorithm systematically eliminates elements below the main diagonal by first positioning the largest element in the diagonal spot of the second column, continuing this process to achieve a triangular matrix form.

From (3.2) we see that the first step of Gaussian elimination with partial pivoting can be expressed as a matrix factorization This is also true of the complete procedure.

Theorem 3.2 (LU decomposition) Any nonsingular n×n matrix A can be decomposed into

P A=LU, whereP is a permutation matrix, Lis a lower triangular matrix with ones on the main diagonal, andU is an upper triangular matrix.

Proof (sketch) The theorem can be proved by induction From (3.2) we have

Symmetric, Positive Definite Matrices

By an induction assumption,B can be decomposed into

P B B=L B U B , and we then see thatP A=LU, where

The LU decomposition requires approximately 2n³/3 floating-point operations (flops) for computation During the kth step of Gaussian elimination, operations are performed on an (n−k+1)×(n−k+1) submatrix, where each element necessitates one multiplication and one addition Consequently, the overall total of flops can be calculated based on these operations.

The LU decomposition of a symmetric, positive definite matrix A can be computed without the need for pivoting By utilizing the matrix's symmetry, the decomposition can also be made symmetric, resulting in a significant reduction in computational effort, requiring only half the work compared to the general LU decomposition process.

Theorem 3.3 ( LDL T decomposition) Any symmetric, positive definite matrix

A=LDL T , where L is lower triangular with ones on the main diagonal and D is a diagonal matrix with positive diagonal elements.

Example 3.4 The positive definite matrix

The diagonal elements inDare positive, and therefore we can put

A=LDL T = (LD 1/2 )(D 1/2 L T ) =U T U, whereU is an upper triangular matrix This variant of the LDL T decomposition is called theCholesky decomposition.

For symmetric matrices, only the main diagonal and the elements above it need to be stored, totaling n(n + 1)/2 matrix elements This storage requirement is the same for both LDL^T and Cholesky decompositions Additionally, since only half the elements of a standard LU decomposition are computed, the computational workload is significantly reduced to approximately n^3/3 flops Notably, the LDL^T decomposition can be performed directly without the need to first calculate the LU decomposition, allowing for the direct computation of the elements in L and D.

Perturbation Theory and Condition Number

Thecondition number of a nonsingular matrixAis defined as κ(A) =A A − 1 , where ã denotes any operator norm If we use a particular matrix norm, e.g., the

The condition number is used to quantify how much the solution of a linear system

Ax=b can change, when the matrix and the right-hand side are perturbed by a small amount.

Theorem 3.5 Assume thatA is nonsingular and that δA A − 1 =r

Ngày đăng: 26/05/2022, 08:41

TỪ KHÓA LIÊN QUAN