Appendix: Maximum likelihood estimation

In this appendix we give an example of the way maximum likelihood parameter estimation proceeds. For simplicity, we chose the estimation of the person parameter in the Rasch model.

The likelihood of a score pattern x = (x1,…, xn) in the Rasch model can be written as

(9.37) where t is the total score. We want to ﬁnd the value of θ that maximizes Equation 9.37. Maximizing the likelihood is equivalent to maximizing the logarithm of the likelihood. So, instead of maximizing Equation 9.37, we maximize its logarithm:

(9.38) Figure 9.10 Rasch item parameter estimates, a low-scoring group versus a high-scoring group.

b in group H

3 2 1

−1 0

−2

−3

b in group L

3 2 1 0

−1

−2

−3

L b

i x

i i n

i t

( | ) exp( )

exp( ) exp( ) e

x θ θ

θ θ

= −

+ − =

∏=1 1 xxp(− ) [ +exp( − )]

−

∏in1 bi xi∏i=n1 1 θ bi 1

ln ( | )L t x bi ln[ exp( b)]

i n

i i

i n

x θ = −θ − + θ−

= =

∑ ∑1 1

where ln is the natural logarithm. When ln L(x|θ) has obtained its maximum as a function of θ, the derivative of ln L(x|θ) with respect to θ is equal to 0. So we can ﬁnd the ML estimate of θ by differentiating Equation 9.38 with respect to θ and setting the result equal to 0 (we must check whether we have obtained a maximum of the function and not a minimum).

Differentiating Equation 9.38 with respect to θ and setting the result equal to 0 gives the equation

(9.39)

This equation is identical to Equation 9.17.

In Figure 9.11, the likelihood, the logarithm of the likelihood, and the derivative of ln L(x|θ) are displayed for a simple example. In Figure 9.11a, we can see that two response patterns with identical total scores have very different values for the likelihood, but also that for both response patterns the maximum is obtained for the same value of θ (0.721). In Figure 9.11b, ln L(x|θ) is given. From Figure 9.11a and Figure 9.11b, it is clear that the value of θ that maximizes L(x|θ) also maximizes ln L(x|θ). For this value of θ, g(θ) in Figure 9.11c equals 0.0. In Figure 9.11c, we see that the slope of g(θ) is negative.

This shows that we obtained a maximum instead of a minimum of ln L(x|θ).

We ﬁnd θ by solving Equation 9.39. We have a maximum in Equa- tion 9.39 if g(θ) decreases with increasing θ in the neighborhood of g(θ)

= 0.0. In other words, the derivative of g(θ) with respect to θ, the second-order derivative of ln L, must be negative. The derivative of g(θ), g′(θ), can be used in the estimation procedure. This is done in the Newton–Raphson procedure.

Let us demonstrate the estimation procedure. We approximate the function g(θ) at the value of the estimated θ in iteration k, θk, by a straight line. This line goes through the point (θk, g(θk)) and has a slope equal to g′(θk). The equation of this line is

g(θ) = g(θk) + g′(θk) (θ − θk) (9.40) Setting g(θ) equal to 0 gives us the next estimate of θ:

θk+1 =θk − g(θk)/ g′(θk) (9.41)

g t b

b t P

i i i

i i

( ) exp( )

exp( ) (

θ θ

= − θ−

+ − = −

= =

∑1 1 ∑1 θθ)=0

Figure 9.11 The likelihood L(x|θ), ln L(x|θ), and the derivative of ln L(x|θ) with respect to θ for a test with b1 = −0.5, b2 = 0.0, b3 = 0.5, and a total score equal to 2.

(a) L(x|θ); The likelihoods of two of the three response patterns with a total score equal to 2 are given

(b) ln L(x|θ) for score pattern 110

g(θ)L(x|θ)

0 0.25

−1.5 −1 −0.5 0 θ

0.5 1 1.5

−1.5 −1 −0.5 0 θ

0.5 1 1.5

−1.5 −1 −0.5 0 θ

0.5 1.5

110110

−4

−3

−2

−1

ln L(110|θ)

−1 0 1 2

In the Newton–Raphson method, we need g′(θ), the derivative of g(θ) in Equation 9.39:

(9.42) The derivative of g(θ) is equal to minus the test information. This relation between the second-order derivative of the log likelihood, g′(θ), and the test information does not hold for the 3PL model. With this model, the test information only equals minus the expected value of g′(θ) (Kendall and Stuart, 1961, pp. 8–9). In the 3PL model, the test information is easier to compute than the second-order derivative g′(θ), so with this model we replace the second derivative in Equation 9.40 by minus the test information (using Fisher’s method of scoring instead of the Newton–Raphson procedure).

The iterative procedure for the estimation of θ in the Rasch model is as follows:

A. We compute a starting value for θ, θ0.

B. We compute a new value θ1 by application of Equation 9.41:

(9.43)

C. We compute |θ1 − θ0|, the absolute value of the difference between the two consecutive estimates of θ.

D. If the value obtained in step C is below a chosen threshold value ε, we stop. The obtained value θ1 is our ﬁnal ML estimate. If the difference exceeds the threshold, we replace θ0 by θ1 and repeat steps B and C. This process is repeated until we reach convergence (by the way, with an inadequate starting value, the procedure may fail to converge).

Let us give a numerical example of the method. We have three items, with b1 = −0.5, b2 = 0.0; and b3 = 0.5. The total score t equals 2. As the starting value, we choose θ = 0.0. The ﬁnal estimate of θ is 0.721, obtained in the second iteration. The data of the iteration

′ = − −

∑=

g Pi P

i n

( )θ ( )[θ i( )]θ

θ θ

1 0

0 1

0 1 0

= −

⎛ −

⎝⎜

⎜

⎞

⎠⎟

⎟

− −

∑=

t P

P P

i i

( )

( )[ ( )

ii n

∑=

⎛

⎝⎜

⎜

⎞

⎠⎟

1 ⎟

]

process are given in Table 9.1. You might want to verify these ﬁgures yourself using a spreadsheet.

Exercises

9.1 Compute the probability of a correct response for a Rasch item with item parameter equal to 0.0 and person parameter θ = −2.0 (0.5) 2.0.

9.2 We have the responses of two homogeneous groups of persons on two items. The response probabilities are P1(θ1) = 0.3775, P1(θ2) = 0.6225, P2(θ1) = 0.4378, and P2(θ2) = 0.7112. Estimate the person parameters θ1 and θ2 on the basis of the response probabilities for the first item, assuming that the Rasch model fits. Use the response probabilities for the second item for verifying whether the Rasch model really fits.

9.3 Given is a test with three Rasch items. The item parameters are b1 = −0.5, b2 = 0.0, and b3 = 0.5. A person has answered items 1 and 2 correctly, and item 3 incorrectly. Compute the likelihood for θ = −1.0, −0.5, 0.0, 0.5, 1.0.

a. Consider the four intervals deﬁned by the ﬁve values of θ for which the likelihood has been computed. In which inter- val lies the maximum likelihood estimate of θ?

b. Assume that we have a population distribution with ﬁve latent classes: θ1 = −1.0, θ2 = −0.5, θ3 = 0.0, θ4 = 0.5, θ5 = 1.0. Also assume that these latent classes have the same relative frequencies: g(θk) = 0.2 for k = 1,…, 5. Compute the EAP estimate of θ.

9.4 We have three items with item parameters:

b1 = 0.5, a1 = 1.0, c1 = 0.0 b2 = 0.5, a2 = 2.0, c2 = 0.0 b3 = 0.5, a3 = 2.0, c3 = 0.25

Compute the item informations at θ = 0.0.

Table 9.1 Iteration history.

Iteration θk g(θk) g′(θk) θk+1

k = 0 0.0 0.5 −0.72001 0.69444

k = 1 0.69444 0.01706 −0.64820 0.72075 k = 2 0.72075 0.00007 −0.64304 0.72086

9.5 We have a discrete distribution of θ with values –1, −0.5, 0.0, 0.5, and 1. The following is known:

Compute the reliability of the test when maximum likelihood is used for the estimation of θ.

Value θ Frequency f(θ) I(θ)

−1.0 0.1 7.864

−0.5 0.2 9.400

0.0 0.4 10.000

0.5 0.2 9.400

1.0 0.1 7.864

Appendix: Maximum likelihood estimation

True score and measurement error

Classical Test Theory and Reliability