Luận án tiến sĩ các thuật toán DC trong quy hoạch toàn phương không lồi và ứng dụng trong phân cụm dữ liệu

Basic Definitions and Some Properties

ByNwe denote the set of natural numbers, i.e., N= {0,1,2, } Consider the n-dimensional Euclidean vector spaceX = R n which is equipped with the canonical inner product hx, ui : n

In the context of vector calculations, the expression \( X i=1 x i u i \) applies to all vectors \( x = (x 1 , , x n ) \) and \( u = (u 1 , , u n ) \) It's important to note that while vectors in \( R n \) are displayed as rows of real numbers in this text, they are treated as columns in matrix operations The transpose of a matrix \( A \) in \( R m×n \) is indicated by \( A T \), leading to the relationship \( hx, ui = x T u \).

The norm in X is given by kxk = hx, xi 1/2 Then, the dual space Y of X can be identified with X.

A function θ : X → R, defined over the set of generalized real numbers R (which includes +∞ and −∞), is considered proper if it never assumes the value of −∞ and is not identically equal to +∞ This means there exists at least one element x in the set X such that θ(x) yields a real number in R.

The effective domain of θ is defined by domθ := {x ∈ X : θ(x) < +∞}.

Let Γ 0 (X) be the set of all lower semicontinuous, proper, convex functions on X The Fenchel conjugate function g ∗ of a function g ∈ Γ0(X) is defined by g ∗ (y) = sup{hx, yi −g(x) |x ∈ X} ∀y ∈ Y.

Note that g ∗ : Y →Ris also a lower semicontinuous, proper, convex function

[38, Propostion 3, p 174] From the definition it follows that g(x) +g ∗ (y) ≥ hx, yi (∀x ∈ X, ∀y ∈ Y).

Denote by g ∗∗ the conjugate function of g ∗ , i.e., g ∗∗ (x) = sup{hx, yi −g ∗ (y) | y ∈ Y}.

Since g ∈ Γ 0 (X), one has g ∗∗ (x) = g(x) for all x ∈ X by the Fenchel-Moreau theorem ( [38, Theorem 1, p 175]) This fact is the basis for various duality theorems in convex programming and DC programming.

Definition 1.1 The subdifferential of a convex function ϕ : R n →R∪ {+∞} at u ∈ domϕ is the set

If x /∈ domϕ then one puts ∂ϕ(x) =∅.

Clearly, the subdifferential ∂ϕ(u) in (1.1) is a closed, convex set The Fer- mat Rule for convex optimization problems asserts that ¯x ∈ R n is a solution of the minimization problem min{ϕ(x) | x ∈ R n } if and only if 0 ∈ ∂ϕ(¯x).

We now recall some useful properties of the Fenchel conjugate functions. The proofs of the next two propositions can be found in [77].

Proposition 1.1 The inclusion x ∈ ∂g ∗ (y) is equivalent to the equality g(x) +g ∗ (y) =hx, yi.

Proposition 1.2 The inclusions y ∈ ∂g(x) and x ∈ ∂g ∗ (y) are equivalent.

In the sequel, we use the convention (+∞)−(+∞)=+∞.

Definition 1.2 The optimization problem inf{f(x) := g(x)−h(x) : x∈ X}, (P) where g andh are functions belonging to Γ 0 (X), is called a DC program The functions g and h are called d.c components of f.

Definition 1.3 For any g, h ∈ Γ 0 (X), the DC program inf{h ∗ (y)−g ∗ (y) | y ∈ Y}, (D) is called the dual problem of (P).

Proposition 1.3 (Toland’s Duality Theorem; see [79]) The DC programs (P) and (D) have the same optimal value.

Definition 1.4 One says that ¯x ∈ R n is a local solution of (P) if the value f(¯x) = g(¯x) − h(¯x) is finite (i.e., ¯x ∈ domg ∩ domh) and there exists a neighborhood U of ¯x such that g(¯x)−h(¯x) ≤ g(x)−h(x) ∀x ∈ U.

If we can choose U = R n , then ¯x is called a (global) solution of (P).

The set of the solutions (resp., the local solutions) of (P) is denoted by sol(P) (resp., by loc(P)).

Proposition 1.4 (First-order optimality condition; see [77]) If x¯ is a local solution of (P), then ∂h(¯x) ⊂ ∂g(¯x).

Definition 1.5 A point ¯x ∈ R n satisfying ∂h(¯x) ⊂ ∂g(¯x) is called a stationary point of (P).

The forthcoming example, which is similar to Example 1.1 in [93], shows that a stationary point needs not to be a local solution.

Example 1.1 Consider the DC program (P) with f(x) =g(x)−h(x), where g(x) = |x − 1| and h(x) = (x − 1) 2 for all x ∈ R For ¯x := 1

∂g(¯x) = ∂h(¯x) = {−1} Since ∂h(¯x) ⊂∂g(¯x), ¯x is a stationary point of (P). But ¯x is not a local solution of (P), because f(x) = x−x 2 for all x ≤ 1. Definition 1.6 A vector ¯x ∈ R n is said to be a critical point of (P) if

If ∂h(¯x) 6= ∅ and ¯x is a stationary point of (P), then ¯x is a critical point of (P) The reverse implication does not hold in general The following example is similar to Example 1.2 in [93].

Example 1.2 Consider the DC program (P) with f(x) = g(x)−h(x) with g(x) = (x − 1 2 ) 2 and h(x) = |x − 1| for all x ∈ R For ¯x := 1, we have

∂g(¯x) ={1} and ∂h(¯x) = [−1,1] Hence ∂g(¯x)∩∂h(¯x) 6= ∅ So ¯x is a critical point of (P) But, ¯x is not a stationary point of (P), because ∂h(¯x) is not a subset of ∂g(¯x).

In problem (P), if the set ∂h(¯x) is a singleton, then the function h is Gâteaux differentiable at the point ¯x, and we have ∂h(¯x) = {∇ G h(¯x)}, where ∇ G h(¯x) represents the Gâteaux derivative of h at ¯x Conversely, if h is Gâteaux differentiable at ¯x, it follows that ∂h(¯x) is a singleton and ∂h(¯x) = {∇ G h(¯x)} Additionally, the condition ∂g(¯x)∩∂h(¯x) 6= ∅ is equivalent to stating that ∂h(¯x) is included in ∂g(¯x).

So, if h is Gˆateaux differentiable at x, then¯ x¯ is a critical point if and only if it is a stationary point.

DCA Schemes

The theory of Decomposition and Coordination Algorithms (DCAs) focuses on breaking down a complex DC program (P) into two sequences of convex programs, denoted as (P k ) and (D k ), where k is a natural number This approach involves constructing two sequences, {x k } and {y k }, such that for each k, x k represents a solution to the convex program (P k ) and y k corresponds to a solution for the convex program (D k ) The effectiveness of this method relies on ensuring that specific properties are maintained throughout the sequences.

(i) The sequences {(g−h)(x k )} and {(h ∗ −g ∗ )(y k )} are decreasing;

(ii) Any cluster point ¯x (resp ¯y) of {x k } (resp., of {y k }) is a critical point of (P) (resp., of (D)).

Following Tuan [93], we can formulate and analyze the general DC algorithm of [77] as follows.

Step 3 k ← k+ 1 and return to Step 2.

For eachk ≥ 0, we have constructed a pair (x k , y k ) satisfying (1.2) and (1.3). Thanks to Proposition 1.2, we can transform the inclusion (1.2) equivalently as y k ∈ ∂h(x k )

Consequently, the condition (1.2) is equivalent to the requirement that y k is a solution of the problem min{h ∗ (y)−[g ∗ (y k−1 ) +hx k , y−y k−1 i] | y ∈ Y}, (D k ) where y k−1 ∈ domg ∗ is the vector defined at the previous step k −1.

The inclusion x k ∈ ∂g ∗ (y k−1 ) means that g ∗ (y)−g ∗ (y k−1 ) ≥ hx k , y −y k−1 i ∀y ∈ Y.

The affine function g ∗ (y k−1 ) + hx k , y−y k−1 i serves as a lower approximation of g ∗ (y) By substituting g ∗ (y) with this lower approximation in the objective function of problem (D) at step k, we formulate the auxiliary problem (D k ).

Since (D k ) is a convex program, solving (D k ) is much easier than solving the DC program (D) Recall that y k is a solution of (D k ).

Similarly, at each stepk+1, the DC program (P) is replaced by the problem minng(x)−[h(x k ) +hx−x k , y k i] | x ∈ Xo, (P k ) where x k ∈ domh ∗ has been defined at step k.

Since (P k ) is a convex program, solving (P k ) is much easier than solving the original DC program (P) As x k+1 satisfies (1.3), it is a solution of (P k ).

The objective function of (Dk) serves as a convex upper approximation of the objective function of (D), with both functions yielding identical values at y k−1 By removing certain real constants from the expression of the objective function of (Dk), we can reformulate the problem to minimize h ∗ (y)− hx k , yi for y within the set Y.

The objective function of problem (P k) serves as a convex upper approximation of the objective function for problem (P), with both functions yielding identical values at point x k By removing certain real constants from the objective function of (P k), we can reformulate the problem into the equivalent form: min{g(x) - ⟨hx, y k⟩ | x ∈ X}.

If x k is a critical point of (P), i.e., ∂g(x k ) ∩ ∂h(x k ) 6= ∅, then DCA may produce a sequence {(x ` , y ` )} with

Indeed, since there exists a point ¯x ∈ ∂g(x k ) ∩ ∂h(x k ), to satisfy (1.2) we can choose y k = ¯x Next, by Proposition 1.2, the inclusion (1.3) is equivalent to y k ∈ ∂g(x k+1 ) So, if we choose x k+1 = x k then (1.3) is fulfilled, because y k = ¯x ∈ ∂g(x k ).

Dollar-Cost Averaging (DCA) guides us to critical points but lacks the means to navigate away from them When encountering a critical point that is not a local minimum, it is essential to employ advanced variational analysis techniques to identify a descent direction.

The following observations can be found in Tuan [93]:

• The DCA is a decomposition procedure which decomposes the solution of the pair of optimization problems (P) and (D) into the parallel solution of the sequence of convex minimization problems (P k ) and (D k ), k ∈ N;

• The DCA does not include any specific technique for solving the convex problems (P k ) and (D k ) Such techniques should be imported from convex programming;

• The performance of DCA depends greatly on a concrete decomposition of the objective function into DC components;

The Deterministic Coordinate Ascent (DCA) method, while classified as a deterministic optimization technique, can produce diverse sequences {x k } and {y k } based on the initial point x 0 This variability arises from the heuristic selection of y k from sol(D k ) and x k from sol(P k ) at each step k, especially when multiple solutions exist for (D k ) and (P k ).

The above analysis allows us to formulate a simplified version of DCA, which includes a termination procedure, as follows.

Output: Finite or infinite sequences {x k } and {y k }.

Step 1 Choose x 0 ∈ domg Take ε > 0 Put k = 0.

Calculate y k by solving the convex program (1.4).

Calculate x k+1 by solving the convex program (1.5).

Step 3 If ||x k+1 −x k || ≤ε then stop, else go to Step 4.

Step 4 k := k+ 1 and return to Step 2.

To understand the performance of the above DCA schemes, let us consider the following example.

Example 1.3 Consider the functionf(x) = g(x)−h(x) with g(x) = (x−1) 2 and h(x) =|x−1| for all x ∈ R Here Y = X = R and we have g ∗ (y) = sup{xy −g(x) | x ∈ R}= sup{xy −(x−1) 2 | x ∈ R} = 1

The directional derivatives are defined as ∂g ∗ (y) = { 1/2 y + 1} for every y ∈ Y, and for the function h, we have ∂h(x) = {−1} for x < 1, ∂h(x) = {1} for x > 1, and ∂h(x) = [−1,1] for x = 1 By employing the DCA Scheme 1.1, we establish two sequences, {x k } and {y k }, where y k belongs to ∂h(x k ) and x k+1 is derived from ∂g ∗ (y k ) for k ∈ N Starting with any x 0 > 1, we find that y 0 = 1, leading to x 1 = 3/2 Consequently, y 1 also equals 1, and it can be shown that for all k ≥ 2, x k = 3/2 and y k = 1, resulting in convergence to ¯x = 3/2 and ¯y = 1 Conversely, beginning with any x 0 < 1 yields the sequences with x k = 1/2 and y k = −1 for all k ≥ 1, which converge to ¯x = 1/2 and ¯y = −1.

 x 2 −x for x ≤1 x 2 −3x+ 2 for x ≥1, one finds that ¯x = 3 2 and ˆx = 1 2 are global minimizers of (P), and xe := 1 is the unique critical point of the problem.

Starting with the initial point \( x_0 = x_e = 1 \) and selecting \( y_0 = 0 \) from the range \( \partial h(x_0) = [-1, 1] \), we find that \( x_1 \) belongs to \( \partial g^*(y_0) = \partial g^*(0) = \{1\} \), leading to \( x_1 = 1 \) Similarly, choosing \( y_1 = 0 \) from \( \partial h(x_1) = [-1, 1] \), we continue this process to generate DCA sequences \( \{x_k\} \) and \( \{y_k\} \), which converge to \( x_e = 1 \) and \( \bar{y} = 0 \), respectively It is important to note that the limit point \( x_e \) of the sequence \( \{x_k\} \) is the unique critical point of problem (P), which is neither a local minimizer nor a stationary point of (P).

To ease the presentation of some related programs, we consider the following scheme.

Step 2 Calculate y k by using (1.2) and find x k+1 ∈ argmin{g(x)− hx, y k i | x ∈ X} (1.6)

General Convergence Theorem

We will recall the fundamental theorem on DCAs of Pham Dinh Tao and

Le Thi Hoai An [77, Theorem 3] provides a solid theoretical foundation for the practical application of these algorithms To effectively utilize this knowledge, it is essential to revisit the definitions of ρ-convex functions, the modulus of convexity for convex functions, and the characteristics of strongly convex functions.

Definition 1.7 Let ρ ≥0 and C be a convex set in the space X A function θ :C → R∪ {+∞} is called ρ-convex if θ λx+ (1−λ)x 0 ≤λθ(x) + (1−λ)θ(x 0 )− λ(1−λ)

2 ρ||x−x 0 || 2 for all numbers λ ∈ (0,1) and vectors x, x 0 ∈ C This amounts to saying that the function θ(ã)−(ρ/2)|| ã || 2 is convex on C.

Definition 1.8 The modulus of convexity of θ on C is given by ρ(θ, C) = supnρ≥ 0 | θ−(ρ/2)|| ã || 2 is convex on Co.

If C = X then we write ρ(θ) instead of ρ(θ, C) Function θ is called strongly convex on C if ρ(θ, C) > 0.

Consider the problem (P) If ρ(g) > 0 (resp., ρ(g ∗ ) > 0), let ρ 1 (resp., ρ ∗ 1 ) be a real number such that 0 ≤ ρ 1 < ρ(g) (resp., 0 ≤ ρ ∗ 1 < ρ(g ∗ )) If ρ(g) = 0 (resp., ρ(g ∗ ) = 0), let ρ 1 = 0 (resp., ρ ∗ 1 = 0) If ρ(h) > 0 (resp., ρ(h ∗ ) > 0), let ρ2 (resp., ρ ∗ 2 ) be a real number such that 0 ≤ ρ2 < ρ(h) (resp.,

The convenient abbreviations dx k := x k+1 −x k and dy k := y k+1 −y k were adopted in [77].

Theorem 1.1 ( [77, Theorem 3]) Let α := inf{f(x) = g(x)−h(x) | x ∈ R n }. Assume that the iteration sequences {x k } and {y k } are generated by DCA Scheme 1 Then, the following properties are valid:

≤ (g −h)(x k )−maxn ρ 1 +ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k−1 || 2 + ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k−1 || 2 + ρ 2 ∗ 2 ||dy k || 2 o hold for every k;

||dx k+1 || 2 + ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k || 2 + ρ 2 2 ||dx k || 2 o hold for every k;

(iii) If α is finite, then {(g −h)(x k )} and {(h ∗ −g ∗ )(y k )} are decreasing sequences that converge to the same limit β ≥ α Furthermore,

(a) If ρ(g) +ρ(h) > 0 (resp., ρ(g ∗ ) +ρ(h ∗ ) > 0), then k→∞lim(x k+1 −x k ) = 0 (resp., lim k→∞(y k+1 −y k ) = 0);

(iv) If α is finite, and {x k } and {y k } are bounded, then for every cluster point x¯ of {x k } (resp., y¯ of {y k }), there is a cluster point y¯ of {y k } (resp., x¯ of {x k }) such that:

The estimates in the assertions (i) and (ii) of the above theorem can be slightly improved as shown in the next remark.

If ρ(h) is greater than 0, then ρ² is a real number that falls within the range of [0, ρ(h)) The sequences {xₖ} and {yₖ} are constructed independently of the constants ρ₁, ρ*₁, ρ₂, and ρ*₂ According to assertion (i) of Theorem 1.1, this leads to the conclusion that for every natural number k, the inequality holds true.

2||dy k || 2 o. Passing the last inequality to the limit as ρ 2 →ρ(h), we get

By applying this method to the constants associated with strongly convex functions within the set {g, h, g ∗, h ∗}, we can demonstrate enhanced versions of the estimates presented in assertions (i) and (ii) of Theorem 1.1.

≤ (g−h)(x k )−maxn ρ(g)+ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k−1 || 2 + ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k−1 || 2 + ρ(h 2 ∗ ) ||dy k || 2 o,

||dx k+1 || 2 + ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k || 2 + ρ(h) 2 ||dx k || 2 o. The forthcoming example is designed as an illustration for Theorem 1.1.

Example 1.4 Consider the function f(x) = g(x) − h(x) in Example 1.1, where g(x) = |x− 1| and h(x) = (x−1) 2 for all x ∈ R Here Y = X = R and we have h ∗ (y) = sup{xy −h(x) | x ∈ R} = sup{xy −(x−1) 2 | x ∈ R} = 1

4y 2 +y. Using DCA Scheme 1.2, we calculate DCA sequences {x k } and {y k } by solving, respectively, the convex programs (1.4) and (1.5) for k ∈ N Choose ε = 0 First, select x 0 = 2 3 Since y 0 is a solution of (1.4) for k = 0, we get y 0 = − 2

In the context of DCA Scheme 1.2, when k = 0, the solution x₁ equals 1, leading to yₖ = 0 for k ≥ 1 and xₖ = 1 for k ≥ 2 The algorithm terminates after one step at k = 1, resulting in the unique local solution ¯x = x₂ for problem (P) This outcome is consistent for any initial point x₀ within the interval [1/2, 3/2]; if x₀ is in this range, the algorithm stops at k = 0, yielding ¯x = x₁ = x₀, which is identified as a stationary point.

(P), which is not a local solution If x 0 < 1 2 or x 0 > 3 2 , then f(x k ) → −∞ as k → ∞ So, {x k } does not have any cluster point.

Convergence Rates

In Chapters 2 and 4, we will demonstrate various results regarding the convergence rates of iterative sequences, focusing on two specific types of linear convergence: Q-linear convergence and R-linear convergence These concepts will be revisited for clarity and understanding.

Definition 1.9 (See, e.g., [70, p 28] and [88, pp 293–294]) One says that a sequence {x k } ⊂ R n converges Q-linearly to a vector ¯x ∈ R n if there exits β ∈ (0,1) such that kx k+1 −xk ≤¯ βkx k −xk¯ for all k sufficiently large.

Clearly, ifx k 6= ¯x, then the relationkx k+1 −xk ≤¯ βkx k −xk¯ in Definition1.9 can be rewritten equivalently as kx k+1 −xk¯ kx k −xk¯ ≤ β The word “Q”, which stands for “quotient”, comes from this context.

Definition 1.10 (See, e.g., [70, p 30]) One says that a sequence {x k } ⊂ R n converges R-linearly to a vector ¯x∈ R n if there is a sequence of nonnegative scalars {à k } such that kx k − xk ≤¯ à k for all k sufficiently large, and {à k } converges Q-linearly to 0.

If a sequence {x_k} converges Q-linearly to a vector ¯x, it also converges R-linearly to ¯x To demonstrate this, we can choose a constant β within the range (0,1) that meets the criteria outlined in Definition 1.9 By defining à_k = β^k * (x_{k-1} - ¯x) for all k ≥ 1, we observe that ||x_k - ¯x|| ≤ à_k for sufficiently large k Furthermore, the sequence {à_k} converges Q-linearly to 0, as it satisfies the condition à_{k+1} ≤ à_k for large k It is important to note that R-linear convergence does not necessarily imply Q-linear convergence, as illustrated by the example of a sequence of positive scalars {x_k} found in [70, p 30].

1, k is odd, and observe that {x k } converges R-linearly to 1, while the sequence does not converge Q-linearly to 1.

Sometimes, one says that a sequence {x k } ⊂ R n converges R-linearly to a vector ¯x ∈ R n whenever limsup k→∞ kx k −xk¯ 1/k < 1 (1.7)

(see, e.g., [92]) The word “R”, which stands for “root”, comes from this context.

The next proposition clarifies the equivalence between the definition of Q−linear convergence in (1.7) and the one given in Definition 1.9.

Proposition 1.5 A sequence {x k } ⊂ R n converges R-linearly to a vector ¯ x ∈ R n if and only if the strict inequality (1.7) holds.

To establish the necessity, we consider a sequence {x_k} that converges Q-linearly to a vector ¯x This implies the existence of a sequence of nonnegative scalars {à_k} such that the distance between x_k and ¯x is bounded by ¯à_k for sufficiently large k, with {à_k} converging Q-linearly to 0 Consequently, we can identify a constant β in the interval (0,1) and a natural number k_1 such that à_k+1 ≤ βà_k for all k ≥ k_1 Assuming à_k1 > 0 without loss of generality, for any k greater than k_1, we have kx_k − ¯x ≤ ¯à_k ≤ βà_k−1, leading to the conclusion that kx_k − ¯x is bounded above by β^(k-k1) à_k1.

It follows that kx k −xk ≤¯ à k 1 β k 1 β k for all k > k1 Therefore, limsup k→∞ kx k −xk¯ 1/k ≤ β limsup k→∞ àk 1 β k 1

To demonstrate sufficiency, assume the validity of (1.7), which implies the existence of a constant γ in the interval (0,1) and a natural number k2 such that for all k ≥ k2, the inequality kxk - xk¯ 1/k ≤ γ holds Consequently, we can express kxk - xk¯ as being less than or equal to γk for k ≥ k2 By defining ak = γk for k in N, we establish a sequence {ak} satisfying kxk - xk¯ ≤ ak for all k ≥ k2 Furthermore, the relationship ak+1 = γak for k ≥ k2, combined with the limit condition lim k→∞ ak = 0, confirms that the sequence {ak} converges Q-linearly to 0 Thus, it follows that the sequence {xk} converges R-linearly to ¯x.

Analysis of an Algorithm in Indefinite Quadratic

This chapter provides an overview of Difference-of-Convex Functions Algorithms (DCAs), originally developed by Pham Dinh Tao and Le Thi Hoai An Additionally, it will define two types of linear convergence rates for vector sequences.

It is well known that DCAs have a key role in nonconvex programming and many areas of applications [55] For more details, we refer to [77,79] and the references therein.

1.1 Basic Definitions and Some Properties

ByNwe denote the set of natural numbers, i.e., N= {0,1,2, } Consider the n-dimensional Euclidean vector spaceX = R n which is equipped with the canonical inner product hx, ui : n

In this article, we explore the relationship between vectors x = (x₁, , xₙ) and u = (u₁, , uₙ) in Rⁿ, where the dot product is expressed as X i=1 x i u i It is important to note that while vectors are presented as rows of real numbers in the text, they are treated as columns during matrix operations Additionally, the transpose of a matrix A ∈ R m×n is represented as A T, leading to the conclusion that the dot product can be succinctly written as hx, ui = x T u.

The norm in X is given by kxk = hx, xi 1/2 Then, the dual space Y of X can be identified with X.

A function θ : X → R, where R includes the generalized real numbers (+∞ and −∞), is considered proper if it never assumes the value −∞ and is not identically equal to +∞ This means that there exists at least one element x in the set X for which θ(x) is a real number.