Luận án tiến sĩ các thuật toán DC trong quy hoạch toàn phương không lồi và ứng dụng trong phân cụm dữ liệu

Basic Definitions and Some Properties

ByNwe denote the set of natural numbers, i.e., N= {0,1,2, } Consider the n-dimensional Euclidean vector spaceX = R n which is equipped with the canonical inner product hx, ui : n

In mathematical notation, the expression \( X i=1 x i u i \) represents the dot product of vectors \( x = (x_1, \ldots, x_n) \) and \( u = (u_1, \ldots, u_n) \) It is important to note that while vectors in \( \mathbb{R}^n \) are presented as rows of real numbers in this context, they are treated as columns during matrix operations The transpose of a matrix \( A \in \mathbb{R}^{m \times n} \) is indicated by \( A^T \), leading to the relationship \( h x, u i = x^T u \).

The norm in X is given by kxk = hx, xi 1/2 Then, the dual space Y of X can be identified with X.

A function θ : X → R, where R represents the set of generalized real numbers including +∞ and −∞, is classified as proper if it never assumes the value −∞ and is not identically equal to +∞ This means there exists at least one element x in the set X for which θ(x) is a real number.

The effective domain of θ is defined by domθ := {x ∈ X : θ(x) < +∞}.

Let Γ 0 (X) be the set of all lower semicontinuous, proper, convex functions on X The Fenchel conjugate function g ∗ of a function g ∈ Γ0(X) is defined by g ∗ (y) = sup{hx, yi −g(x) |x ∈ X} ∀y ∈ Y.

Note that g ∗ : Y →Ris also a lower semicontinuous, proper, convex function

[38, Propostion 3, p 174] From the definition it follows that g(x) +g ∗ (y) ≥ hx, yi (∀x ∈ X, ∀y ∈ Y).

Denote by g ∗∗ the conjugate function of g ∗ , i.e., g ∗∗ (x) = sup{hx, yi −g ∗ (y) | y ∈ Y}.

Since g ∈ Γ 0 (X), one has g ∗∗ (x) = g(x) for all x ∈ X by the Fenchel-Moreau theorem ( [38, Theorem 1, p 175]) This fact is the basis for various duality theorems in convex programming and DC programming.

Definition 1.1 The subdifferential of a convex function ϕ : R n →R∪ {+∞} at u ∈ domϕ is the set

If x /∈ domϕ then one puts ∂ϕ(x) =∅.

Clearly, the subdifferential ∂ϕ(u) in (1.1) is a closed, convex set The Fer- mat Rule for convex optimization problems asserts that ¯x ∈ R n is a solution of the minimization problem min{ϕ(x) | x ∈ R n } if and only if 0 ∈ ∂ϕ(¯x).

We now recall some useful properties of the Fenchel conjugate functions. The proofs of the next two propositions can be found in [77].

Proposition 1.1 The inclusion x ∈ ∂g ∗ (y) is equivalent to the equality g(x) +g ∗ (y) =hx, yi.

Proposition 1.2 The inclusions y ∈ ∂g(x) and x ∈ ∂g ∗ (y) are equivalent.

In the sequel, we use the convention (+∞)−(+∞)=+∞.

Definition 1.2 The optimization problem inf{f(x) := g(x)−h(x) : x∈ X}, (P) where g andh are functions belonging to Γ 0 (X), is called a DC program The functions g and h are called d.c components of f.

Definition 1.3 For any g, h ∈ Γ 0 (X), the DC program inf{h ∗ (y)−g ∗ (y) | y ∈ Y}, (D) is called the dual problem of (P).

Proposition 1.3 (Toland’s Duality Theorem; see [79]) The DC programs (P) and (D) have the same optimal value.

Definition 1.4 One says that ¯x ∈ R n is a local solution of (P) if the value f(¯x) = g(¯x) − h(¯x) is finite (i.e., ¯x ∈ domg ∩ domh) and there exists a neighborhood U of ¯x such that g(¯x)−h(¯x) ≤ g(x)−h(x) ∀x ∈ U.

If we can choose U = R n , then ¯x is called a (global) solution of (P).

The set of the solutions (resp., the local solutions) of (P) is denoted by sol(P) (resp., by loc(P)).

Proposition 1.4 (First-order optimality condition; see [77]) If x¯ is a local solution of (P), then ∂h(¯x) ⊂ ∂g(¯x).

Definition 1.5 A point ¯x ∈ R n satisfying ∂h(¯x) ⊂ ∂g(¯x) is called a stationary point of (P).

The forthcoming example, which is similar to Example 1.1 in [93], shows that a stationary point needs not to be a local solution.

Example 1.1 Consider the DC program (P) with f(x) =g(x)−h(x), where g(x) = |x − 1| and h(x) = (x − 1) 2 for all x ∈ R For ¯x := 1

∂g(¯x) = ∂h(¯x) = {−1} Since ∂h(¯x) ⊂∂g(¯x), ¯x is a stationary point of (P). But ¯x is not a local solution of (P), because f(x) = x−x 2 for all x ≤ 1. Definition 1.6 A vector ¯x ∈ R n is said to be a critical point of (P) if

If ∂h(¯x) 6= ∅ and ¯x is a stationary point of (P), then ¯x is a critical point of (P) The reverse implication does not hold in general The following example is similar to Example 1.2 in [93].

Example 1.2 Consider the DC program (P) with f(x) = g(x)−h(x) with g(x) = (x − 1 2 ) 2 and h(x) = |x − 1| for all x ∈ R For ¯x := 1, we have

∂g(¯x) ={1} and ∂h(¯x) = [−1,1] Hence ∂g(¯x)∩∂h(¯x) 6= ∅ So ¯x is a critical point of (P) But, ¯x is not a stationary point of (P), because ∂h(¯x) is not a subset of ∂g(¯x).

In the context of problem (P), if the set ∂h(¯x) is a singleton, then the function h is Gâteaux differentiable at the point ¯x, with ∂h(¯x) equating to the Gâteaux derivative ∇ G h(¯x) Conversely, if h is Gâteaux differentiable at ¯x, it follows that ∂h(¯x) is also a singleton, confirming that ∂h(¯x) equals ∇ G h(¯x) Furthermore, the intersection of the subdifferentials ∂g(¯x) and ∂h(¯x) being non-empty is equivalent to the inclusion of ∂h(¯x) within ∂g(¯x).

So, if h is Gˆateaux differentiable at x, then¯ x¯ is a critical point if and only if it is a stationary point.

DCA Schemes

The theory of Decomposition and Convex Approximations (DCAs) aims to simplify complex DC programs by breaking them down into two sequences of convex programs, denoted as (P k ) and (D k ) Each sequence approximates the original program (P) and its dual (D), respectively In this approach, it is essential to construct two sequences, {x k } and {y k }, where x k represents a solution to the convex program (P k ) and y k corresponds to a solution for (D k ) This methodology ensures that specific properties hold true for each k in the natural numbers.

(i) The sequences {(g−h)(x k )} and {(h ∗ −g ∗ )(y k )} are decreasing;

(ii) Any cluster point ¯x (resp ¯y) of {x k } (resp., of {y k }) is a critical point of (P) (resp., of (D)).

Following Tuan [93], we can formulate and analyze the general DC algorithm of [77] as follows.

Step 3 k ← k+ 1 and return to Step 2.

For eachk ≥ 0, we have constructed a pair (x k , y k ) satisfying (1.2) and (1.3). Thanks to Proposition 1.2, we can transform the inclusion (1.2) equivalently as y k ∈ ∂h(x k )

Consequently, the condition (1.2) is equivalent to the requirement that y k is a solution of the problem min{h ∗ (y)−[g ∗ (y k−1 ) +hx k , y−y k−1 i] | y ∈ Y}, (D k ) where y k−1 ∈ domg ∗ is the vector defined at the previous step k −1.

The inclusion x k ∈ ∂g ∗ (y k−1 ) means that g ∗ (y)−g ∗ (y k−1 ) ≥ hx k , y −y k−1 i ∀y ∈ Y.

The affine function g ∗ (y k−1 ) + hx k , y−y k−1 i serves as a lower approximation of g ∗ (y) By substituting g ∗ (y) with this lower approximation in the objective function of (D), we derive the auxiliary problem (D k ).

Since (D k ) is a convex program, solving (D k ) is much easier than solving the DC program (D) Recall that y k is a solution of (D k ).

Similarly, at each stepk+1, the DC program (P) is replaced by the problem minng(x)−[h(x k ) +hx−x k , y k i] | x ∈ Xo, (P k ) where x k ∈ domh ∗ has been defined at step k.

Since (P k ) is a convex program, solving (P k ) is much easier than solving the original DC program (P) As x k+1 satisfies (1.3), it is a solution of (P k ).

The objective function of (Dk) serves as a convex upper approximation of the objective function of (D), with both functions yielding the same values at y k−1 By removing certain real constants from the expression of the objective function of (D k), we can reformulate the problem as min{h ∗ (y)− hx k , yi | y ∈ Y}.

The objective function of problem (P k) serves as a convex upper approximation of the objective function for problem (P), with both functions yielding identical values at point x k By removing certain real constants from the objective function of (P k), we can reformulate the problem into an equivalent form: minimize {g(x) - ⟨h, y k⟩ | x ∈ X}.

If x k is a critical point of (P), i.e., ∂g(x k ) ∩ ∂h(x k ) 6= ∅, then DCA may produce a sequence {(x ` , y ` )} with

Indeed, since there exists a point ¯x ∈ ∂g(x k ) ∩ ∂h(x k ), to satisfy (1.2) we can choose y k = ¯x Next, by Proposition 1.2, the inclusion (1.3) is equivalent to y k ∈ ∂g(x k+1 ) So, if we choose x k+1 = x k then (1.3) is fulfilled, because y k = ¯x ∈ ∂g(x k ).

Dollar-Cost Averaging (DCA) can guide us to critical points, yet it lacks the means to navigate away from these points When faced with a critical point that isn't a local minimizer, it's essential to employ advanced variational analysis techniques to determine a descent direction.

The following observations can be found in Tuan [93]:

• The DCA is a decomposition procedure which decomposes the solution of the pair of optimization problems (P) and (D) into the parallel solution of the sequence of convex minimization problems (P k ) and (D k ), k ∈ N;

• The DCA does not include any specific technique for solving the convex problems (P k ) and (D k ) Such techniques should be imported from convex programming;

• The performance of DCA depends greatly on a concrete decomposition of the objective function into DC components;

The Deterministic Coordinate Ascent (DCA) method, while categorized as a deterministic optimization technique, can produce diverse sequences {x k } and {y k } based on the initial point x 0 This variability arises from the heuristic selection process for y k from sol(D k ) and x k from sol(P k ) at each iteration k, particularly when either (D k ) or (P k ) presents multiple solutions.

The above analysis allows us to formulate a simplified version of DCA, which includes a termination procedure, as follows.

Output: Finite or infinite sequences {x k } and {y k }.

Step 1 Choose x 0 ∈ domg Take ε > 0 Put k = 0.

Calculate y k by solving the convex program (1.4).

Calculate x k+1 by solving the convex program (1.5).

Step 3 If ||x k+1 −x k || ≤ε then stop, else go to Step 4.

Step 4 k := k+ 1 and return to Step 2.

To understand the performance of the above DCA schemes, let us consider the following example.

Example 1.3 Consider the functionf(x) = g(x)−h(x) with g(x) = (x−1) 2 and h(x) =|x−1| for all x ∈ R Here Y = X = R and we have g ∗ (y) = sup{xy −g(x) | x ∈ R}= sup{xy −(x−1) 2 | x ∈ R} = 1

The subdifferential of the function g* at any point y in Y is given by ∂g*(y) = {1/2 y + 1} For the function h, the subdifferential is defined as ∂h(x) = {-1} for x < 1, ∂h(x) = {1} for x > 1, and ∂h(x) = [-1, 1] for x = 1 Utilizing the DCA Scheme 1.1, we construct two sequences {x_k} and {y_k} such that y_k ∈ ∂h(x_k) and x_{k+1} ∈ ∂g*(y_k) for k in natural numbers Starting with any x_0 > 1, we find that y_0 = 1, leading to x_1 = 3/2 Consequently, y_1 also equals 1, and it can be shown that x_k = 3/2 and y_k = 1 for all k ≥ 2, resulting in convergence to ¯x = 3/2 and ¯y = 1 In contrast, initiating with any x_0 < 1 yields sequences {x_k} and {y_k} where x_k = 1/2 and y_k = -1 for all k ≥ 1, converging to ¯x = 1/2 and ¯y = -1.

 x 2 −x for x ≤1 x 2 −3x+ 2 for x ≥1, one finds that ¯x = 3 2 and ˆx = 1 2 are global minimizers of (P), and xe := 1 is the unique critical point of the problem.

Starting with the initial point \( x_0 = x_e = 1 \) and selecting \( y_0 = 0 \) from the interval \( \partial h(x_0) = [-1, 1] \), we find that \( x_1 \) belongs to \( \partial g^*(y_0) = \partial g^*(0) = \{1\} \), leading to \( x_1 = 1 \) Again, we can choose \( y_1 = 0 \) since it falls within \( \partial h(x_1) = [-1, 1] \) This process generates DCA sequences \( \{x_k\} \) and \( \{y_k\} \), which converge to \( x_e = 1 \) and \( \bar{y} = 0 \), respectively It is important to note that the limit point \( x_e \) of the sequence \( \{x_k\} \) is the unique critical point of problem (P), which does not serve as a local minimizer or a stationary point of (P).

To ease the presentation of some related programs, we consider the following scheme.

Step 2 Calculate y k by using (1.2) and find x k+1 ∈ argmin{g(x)− hx, y k i | x ∈ X} (1.6)

General Convergence Theorem

We will recall the fundamental theorem on DCAs of Pham Dinh Tao and

Le Thi Hoai An [77, Theorem 3] provides a solid theoretical foundation for the practical application of these algorithms To fully understand their implementation, it is essential to revisit the definitions of ρ-convex functions, the modulus of convexity for convex functions, and the characteristics of strongly convex functions.

Definition 1.7 Let ρ ≥0 and C be a convex set in the space X A function θ :C → R∪ {+∞} is called ρ-convex if θ λx+ (1−λ)x 0 ≤λθ(x) + (1−λ)θ(x 0 )− λ(1−λ)

2 ρ||x−x 0 || 2 for all numbers λ ∈ (0,1) and vectors x, x 0 ∈ C This amounts to saying that the function θ(ã)−(ρ/2)|| ã || 2 is convex on C.

Definition 1.8 The modulus of convexity of θ on C is given by ρ(θ, C) = supnρ≥ 0 | θ−(ρ/2)|| ã || 2 is convex on Co.

If C = X then we write ρ(θ) instead of ρ(θ, C) Function θ is called strongly convex on C if ρ(θ, C) > 0.

Consider the problem (P) If ρ(g) > 0 (resp., ρ(g ∗ ) > 0), let ρ 1 (resp., ρ ∗ 1 ) be a real number such that 0 ≤ ρ 1 < ρ(g) (resp., 0 ≤ ρ ∗ 1 < ρ(g ∗ )) If ρ(g) = 0 (resp., ρ(g ∗ ) = 0), let ρ 1 = 0 (resp., ρ ∗ 1 = 0) If ρ(h) > 0 (resp., ρ(h ∗ ) > 0), let ρ2 (resp., ρ ∗ 2 ) be a real number such that 0 ≤ ρ2 < ρ(h) (resp.,

The convenient abbreviations dx k := x k+1 −x k and dy k := y k+1 −y k were adopted in [77].

Theorem 1.1 ( [77, Theorem 3]) Let α := inf{f(x) = g(x)−h(x) | x ∈ R n }. Assume that the iteration sequences {x k } and {y k } are generated by DCA Scheme 1 Then, the following properties are valid:

≤ (g −h)(x k )−maxn ρ 1 +ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k−1 || 2 + ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k−1 || 2 + ρ 2 ∗ 2 ||dy k || 2 o hold for every k;

||dx k+1 || 2 + ρ 2 2 ||dx k || 2 , ρ 2 ∗ 1 ||dy k || 2 + ρ 2 2 ||dx k || 2 o hold for every k;

(iii) If α is finite, then {(g −h)(x k )} and {(h ∗ −g ∗ )(y k )} are decreasing sequences that converge to the same limit β ≥ α Furthermore,

(a) If ρ(g) +ρ(h) > 0 (resp., ρ(g ∗ ) +ρ(h ∗ ) > 0), then k→∞lim(x k+1 −x k ) = 0 (resp., lim k→∞(y k+1 −y k ) = 0);

(iv) If α is finite, and {x k } and {y k } are bounded, then for every cluster point x¯ of {x k } (resp., y¯ of {y k }), there is a cluster point y¯ of {y k } (resp., x¯ of {x k }) such that:

The estimates in the assertions (i) and (ii) of the above theorem can be slightly improved as shown in the next remark.

If ρ(h) is greater than 0, then ρ² is a real number that falls within the range of [0, ρ(h)) The sequences {xₖ} and {yₖ} are constructed independently of the constants ρ₁, ρ₁*, ρ², and ρ₂* According to assertion (i) of Theorem 1.1, this leads to the conclusion that for every natural number k, the inequality holds true.

2||dy k || 2 o. Passing the last inequality to the limit as ρ 2 →ρ(h), we get

By applying a simultaneous approach to the constants associated with strongly convex functions within the set {g, h, g ∗ , h ∗ }, we can demonstrate enhanced versions of the estimates presented in assertions (i) and (ii) of Theorem 1.1.

≤ (g−h)(x k )−maxn ρ(g)+ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k−1 || 2 + ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k−1 || 2 + ρ(h 2 ∗ ) ||dy k || 2 o,

||dx k+1 || 2 + ρ(h) 2 ||dx k || 2 , ρ(g 2 ∗ ) ||dy k || 2 + ρ(h) 2 ||dx k || 2 o. The forthcoming example is designed as an illustration for Theorem 1.1.

Example 1.4 Consider the function f(x) = g(x) − h(x) in Example 1.1, where g(x) = |x− 1| and h(x) = (x−1) 2 for all x ∈ R Here Y = X = R and we have h ∗ (y) = sup{xy −h(x) | x ∈ R} = sup{xy −(x−1) 2 | x ∈ R} = 1

4y 2 +y. Using DCA Scheme 1.2, we calculate DCA sequences {x k } and {y k } by solving, respectively, the convex programs (1.4) and (1.5) for k ∈ N Choose ε = 0 First, select x 0 = 2 3 Since y 0 is a solution of (1.4) for k = 0, we get y 0 = − 2

In the context of the DCA Scheme 1.2, when k = 0, the solution x₁ is determined to be 1, leading to yₖ = 0 for k ≥ 1 and xₖ = 1 for k ≥ 2 The algorithm successfully meets the stopping condition at k = 1, resulting in the point ¯x = x₂, which is identified as the unique local solution of (P) Furthermore, this outcome holds true for any initial point x₀ within the interval [1/2, 3/2]; if x₀ is within this range, the algorithm halts at k = 0, yielding ¯x = x₁ = x₀, which serves as a stationary point.

(P), which is not a local solution If x 0 < 1 2 or x 0 > 3 2 , then f(x k ) → −∞ as k → ∞ So, {x k } does not have any cluster point.

Convergence Rates

In Chapters 2 and 4, we will demonstrate various results regarding the convergence rates of iterative sequences, focusing on two specific types of linear convergence: Q-linear convergence and R-linear convergence These concepts will be revisited in the subsequent sections.

Definition 1.9 (See, e.g., [70, p 28] and [88, pp 293–294]) One says that a sequence {x k } ⊂ R n converges Q-linearly to a vector ¯x ∈ R n if there exits β ∈ (0,1) such that kx k+1 −xk ≤¯ βkx k −xk¯ for all k sufficiently large.

Clearly, ifx k 6= ¯x, then the relationkx k+1 −xk ≤¯ βkx k −xk¯ in Definition1.9 can be rewritten equivalently as kx k+1 −xk¯ kx k −xk¯ ≤ β The word “Q”, which stands for “quotient”, comes from this context.

Definition 1.10 (See, e.g., [70, p 30]) One says that a sequence {x k } ⊂ R n converges R-linearly to a vector ¯x∈ R n if there is a sequence of nonnegative scalars {à k } such that kx k − xk ≤¯ à k for all k sufficiently large, and {à k } converges Q-linearly to 0.

If a sequence {x_k} converges Q-linearly to a vector ¯x, it also converges R-linearly to ¯x This can be demonstrated by choosing a constant β in the range (0,1) that meets the criteria outlined in Definition 1.9 By defining à_k = β^k * (x_{k-1} - ¯x) for all k ≥ 1, we observe that the distance |x_k - ¯x| is bounded by à_k for sufficiently large k Furthermore, the sequence {à_k} converges Q-linearly to 0, as it satisfies the condition à_{k+1} ≤ à_k for large k However, it is important to note that R-linear convergence does not necessarily imply Q-linear convergence, as illustrated by the example of a sequence of positive scalars {x_k}.

1, k is odd, and observe that {x k } converges R-linearly to 1, while the sequence does not converge Q-linearly to 1.

Sometimes, one says that a sequence {x k } ⊂ R n converges R-linearly to a vector ¯x ∈ R n whenever limsup k→∞ kx k −xk¯ 1/k < 1 (1.7)

(see, e.g., [92]) The word “R”, which stands for “root”, comes from this context.

The next proposition clarifies the equivalence between the definition of Q−linear convergence in (1.7) and the one given in Definition 1.9.

Proposition 1.5 A sequence {x k } ⊂ R n converges R-linearly to a vector ¯ x ∈ R n if and only if the strict inequality (1.7) holds.

To demonstrate the necessity of the convergence, consider a sequence {x_k} that converges Q-linearly to a vector ¯x This implies the existence of a sequence of nonnegative scalars {à_k}, satisfying the condition kx_k - ¯xk ≤ ¯à_k for sufficiently large k, where {à_k} converges Q-linearly to 0 Consequently, we can identify a constant β in the interval (0,1) and an integer k_1 such that à_k+1 ≤ βà_k for all k ≥ k_1 Assuming à_k1 > 0, for any k > k_1, it follows that kx_k - ¯xk ≤ ¯à_k ≤ βà_k−1, leading to kx_k - ¯xk ≤ β^(k - k_1) à_k1.

It follows that kx k −xk ≤¯ à k 1 β k 1 β k for all k > k1 Therefore, limsup k→∞ kx k −xk¯ 1/k ≤ β limsup k→∞ àk 1 β k 1

To demonstrate sufficiency, assume that equation (1.7) holds true This implies the existence of a constant γ within the interval (0,1) and a natural number k₂ such that the inequality \( \|x_k - \bar{x}\| \leq \gamma^k \) holds for all \( k \geq k₂ \) Consequently, we can define the sequence \( \{a_k\} \) where \( a_k = \gamma^k \), ensuring that \( \|x_k - \bar{x}\| \leq a_k \) for \( k \geq k₂ \) Furthermore, the relationship \( a_{k+1} = \gamma a_k \) for \( k \geq k₂ \), along with the limit \( \lim_{k \to \infty} a_k = 0 \), indicates that the sequence \( \{a_k\} \) converges Q-linearly to 0 Thus, it follows that the sequence \( \{x_k\} \) converges R-linearly to \( \bar{x} \).

Analysis of an Algorithm in Indefinite Quadratic

In this chapter, we will explore the foundational concepts of Difference-of-Convex Functions Algorithms (DCAs), originally developed by Pham Dinh Tao and Le Thi Hoai An Additionally, we will define two types of linear convergence rates for vector sequences.

It is well known that DCAs have a key role in nonconvex programming and many areas of applications [55] For more details, we refer to [77,79] and the references therein.

1.1 Basic Definitions and Some Properties

ByNwe denote the set of natural numbers, i.e., N= {0,1,2, } Consider the n-dimensional Euclidean vector spaceX = R n which is equipped with the canonical inner product hx, ui : n

In the context of vector operations, for all vectors \( x = (x_1, \ldots, x_n) \) and \( u = (u_1, \ldots, u_n) \), the expression \( \sum_{i=1}^n x_i u_i \) represents the dot product It is important to note that while vectors in \( \mathbb{R}^n \) are displayed as rows of real numbers in textual representations, they are treated as columns in matrix computations The transpose of a matrix \( A \) in \( \mathbb{R}^{m \times n} \) is indicated as \( A^T \), leading to the relationship \( \langle x, u \rangle = x^T u \).

The norm in X is given by kxk = hx, xi 1/2 Then, the dual space Y of X can be identified with X.

A function θ : X → R is considered proper if it never takes the value of −∞ and is not identically equal to +∞ This means there exists at least one element x in the set X such that θ(x) yields a real number in R.