4.4 SOLVING THE OPTIMAL LOT-SIZING PROBLEM AS A STOCHASTIC
4.4.2 S TOCHASTIC F LOW M ODEL (SFM) FOR THE L OT -S IZING P ROBLEM
4.4.2.3 Applying the General Abstraction Scheme to the Lot-Sizing Problem
In the following, we apply the general abstraction scheme of Section 4.2 to the lot-sizing problem limited to two classes and show how the SFM of Figure 4.4 is formally obtained. Using the same notation as earlier, xiis the number of jobs in the input batching queue of class i with lot sizeθi, and y is the queue length of the output batching queue. Each class is associated with a setup time denoted by si, andα(t)indicates the index of the class that is currently served at time t. The state of the system is(x1(t), x2(t), z,α(t), y), where z is a Boolean variable to indicate whether the setup time is reached, that is, z=1 when the setup is done and is 0 otherwise. The event set is{a1, a2, d1, d2, l1, l2, ss1, ss2}, where aiand diare the arrivals and departures of class i; li stands for the lot-forming event, which is caused by an arrival in the incoming batching queue that completes a lot and enables the server to start serving; and ssi represents the completion of the setup time for class i (one of the induced events defined earlier).
Following the general abstraction scheme, there are 8 aggregate states in this system, which we denote as ASj, j=1,. . ., 8:
State 1
State 2
State 4 State 3
yreset= 0 zreset= 0 y= θ2
y= θ1
yreset= 0 zreset= 0
z=s1
z=s2
x2=r2
z= 1 y= 0
z= 1 x1= r1
x1=r1
z= 1 y= 0 x2= r2
x2= x2= 0
x2= 0 x2> 0 x2> 0
0
r2– β2
r2 y = β2 z= 1
x2= r2 x1= 0 x1= 0
x1> 0
x1> 0
0
r1– β1
r1 y = β1 x1=
x1= r1
FIGURE 4.5 Stochastic hybrid automaton model for lot-sizing with a non-idling policy. (From Yao, C. and Cassandras, C.G., IEEE Trans. Automat. Sci. Eng., 9, 250, 2012. With permission.)
AS1: states in AS1satisfy
x1+y<θ1, z=0, α(t)=1 (4.30) In this aggregate state, the server is idle, and the active event set of every state in AS1is{a1, a2, l1, ss1}.
AS2: states in AS2satisfy
x1<θ1, z=1, α(t)=1 (4.31)
The setup time is up in this aggregate state, but the input batching queue does not have enough jobs to form a lot; hence, the server remains idle. The active event set is{a1, a2, l1}.
AS3: states in AS3satisfy
x1≥θ1, z=0, α(t)=1 (4.32)
In this aggregate state, the system is waiting for the setup to be done, although it already has enough jobs to form a lot. The active event set is{a1, a2, ss1}.
AS4: states in AS4satisfy
y<θ1, x1+y≥θ1, z=1, α(t)=1 (4.33) The server is serving jobs of class 1 in this aggregate state, until the final job in the current lot is complete, at which time the system transitions to AS5. The active event set is{a1, a2, d1}.
AS5: states in AS5satisfy
x2+y<θ2, z=0, α(t)=2 (4.34) This aggregate state is symmetric to AS1, with active event set{a1, a2, l2, ss2}.
AS6: states in AS6satisfy
x2 <θ2, z=1, α(t)=2 (4.35)
This aggregate state is symmetric to AS2, with active event set{a1, a2, l2}.
AS7: states in AS7satisfy
x2 ≥θ2, z=0, α(t)=2 (4.36)
This aggregate state is symmetric to AS3, with active event set{a1, a2, ss2}.
AS8: states in AS8satisfy
y<θ2, x2+y≥θ2, z=1, α(t)=2 (4.37) This aggregate state is symmetric to AS4, with active event set{a1, a2, d2}.
It is also easy to verify that criterion 2 in (4.2) is satisfied within all 8 aggregate states, as aievents always increase xi by 1, and di events always decrease xi and increase y both by 1. A high-level automaton model of the lot-sizing problem is shown in Figure 4.6. Figures 4.7 and 4.8 show the detailed transitions of typical interior states within AS1−AS4. Detailed transitions of interior states within AS5−AS8are symmetric to those shown in Figures 4.7 and 4.8.
Next, we abstract the discrete transitions within each of the 8 aggregate states to flow-like continuous dynamics. For AS1−AS3, the interior transitions are caused by events aiwhich can be abstracted as incoming flows with rate ri(t), so thatx˙i(t)=ri(t)and˙y(t)=0. In AS4, the server starts serving jobs from class 1, and that can be abstracted through an outflow with rateβ1(t), thus giving dynamicsx˙1(t) =r1(t)−β1(t),x˙2(t) =r2(t),y˙(t)= β1(t). Similarly, we can obtain the continuous dynamics in AS5−AS8. In addition, the clock variable can be naturally abstracted to represent the actual time, that is, z(t)represents the time since the system switches to the current class, and by definition,z˙(t)=1. The overall system dynamics are precisely those shown in (4.21) through (4.23). The stochastic hybrid automaton model of the resulting SFM is also exactly the same as the one in Figure 4.4 of Section 4.4.2.2.
l1
l1
l2
l2
d1
SS1
d2 AS2
AS4
SS1
AS3
SS2
SS2
AS1
AS7
AS8
AS6 AS5
FIGURE 4.6 Automaton model for the lot-sizing problem.
a1
x1+ 1, x2, 0
1, 0 x1+ 1, x2, 0
1, 0
x1≥ θ1, x2, 0 1, 0
x1< θ1, x2, 1 1, 0
x1, x2+ 1, 0 1, 0
x1, x2+ 1, 1 1, 0 x1+ 1, x2, 1
1, 0
x1< θ1, x2, 0 1, 0
x1, x2+ 1, 0 1, 0
a2 a2 a2
AS2
AS1 AS3
a1 a1
FIGURE 4.7 State transitions of typical interior states in AS1, AS2, AS3.
AS4
a2 a1
d1 x1, x2+ 1, 1
1, y x1+ 1, x2, 1
1, y
x1, x2, 1 1, y
x1–1, x2, 1 1, y+ 1
FIGURE 4.8 State transitions for typical interior states in AS4.
4.4.3 LOT-SIZEOPTIMIZATION
The lot-size optimization problem is defined by viewingθ=(θ1,. . .,θN)as a controllable parameter vector and seeking to optimize performance metrics of the form
J(θ; x(0), T)=E [L(θ; x(0), T)] (4.38)
where L(θ; x(0), T)is a sample function of interest evaluated in the interval [0, T] with initial conditions x(0). In the lot-sizing problem, we are typically interested in minimizing the mean
system time for each class or equivalently the mean workload (i.e., queue content) in the SFM. We will consider the sample performance function
L(θ; x(0), T)= N
i=1
γiQi(θ; x(0), T) (4.39)
whereγi ∈R+, i=1,. . ., N, are weight parameters which can be interpreted as unit holding costs of different classes, and
Qi(θ)= 1 T
T
0
[xi(t,θ)+1(a(t)=i)ãy(t)]dt (4.40)
where 1(m=n)is the usual indicator function such that 1(m=n)=1 if m =n and 0 otherwise.
Observe that 1(a(t)=i)ãy(t)represents the delay experienced by jobs in the output queue of the actual DES while waiting for the remaining jobs of the lot under process.
Because of the multiclass nature of this problem, as mentioned in the introduction, in addition to the usual system-centric optimization focusing on the objective J(θ; x(0), T), each class (user) may solve its own optimization problem with a performance metric of the form Ji(θ)=E[Qi(θ)]; this leads to a non-cooperative resource contention game that calls for solving N distinct user-centric optimization problems. We shall consider both in Section 4.4.4.
Since we do not wish to impose any limitations on the defining processes{ri(t)}and{βi(t)}(other than mild technical conditions), it is infeasible to obtain closed-form expressions for J(θ; x(0), T).
Therefore, we resort to iterative methods as already discussed in Section 4.3, which are driven by estimates of the cost function gradient with respect to the parameter vector of interest. Thus, we are interested in estimating dJ/dθibased on sample path data, where a sample path of the system may be directly observed or obtained through simulation. We then seek to obtainθ∗minimizing J(θ; x(0), T)through an iterative scheme of the form
θi,n+1=θi,n−ηnHi,n(θn; x(0), T,ωn), n=0, 1,. . . (4.41) where Hi,n(θi,n; x(0), T,ωn)is an estimate of dJ/dθievaluated atθ=(θ1,n,θ2,n,. . .,θN,n)and based on information obtained from a sample path denoted byωn. For our purposes, we shall consider T to be a fixed time horizon and evaluate performance over[0, T]. To simplify the analysis that follows, we will assume that xi(0)=0, for all i (in practice, it is possible to avoid this issue as explained, e.g., in [37]). In addition, we will omit in subsequent notation the initial condition, the observation interval T and the sample pathωnunless it is necessary to stress such dependence.
In order to execute an algorithm such as (4.41), we need to estimate Hi,n(θi,n), that is, the derivative dJ/dθi. The IPA approach is based on using the sample derivative dL/dθias an estimate of dJ/dθi. The strength of the approach is that dL/dθican be obtained from observable sample path data alone and, usually, in a very simple manner that can be readily implemented online. As already mentioned in Section 4.3, it is also the case that dL/dθi is an unbiased estimate of dJ/dθi, a property that allows us to use (4.41) in obtainingθ∗.
It is clear from (4.39) and (4.40) that obtaining sample derivatives of Qi(θ)requires the sample derivatives of the states xi(θ, t), y(θ, t)and of the event timesτk(θ)where the explicit dependence on the parameterθis included here for emphasis. This is precisely the task of IPA which we present in the next section.