Linear equating with an anchor test

In Design 3 we have an internal or external anchor test V that has the same function as test forms X and Y. By means of the common anchor test, tests X and Y can be equated. First, we must define the population for which the equating relation is to hold. This population, the so-called synthetic population, might be defined by combined group A + B, but other population definitions are possible (Kolen and Brennan, 1995, pp. 106–111). Using data for test X and V, and test Y and V, we can estimate means and standard deviations of X and Y in the synthetic population. Next, Equation 11.2 through Equation 11.4 are used to define the equivalence relationship between tests X and Y. Several methods for estimating means and standard deviations of tests X and Y in the synthetic population are available. Tucker proposed a method for linear equating that can be used if groups A and B do not differ much in ability. The resulting equating equation is formally identical to a result obtained by Lord under the assumption of random groups A and B. Levine developed two methods for samples that may differ widely in ability. The first method can be used with equally reliable tests. The second method is suitable when the test forms differ in reliability. In the second case, the true scores can be scaled to the same scale, but obviously raw scores cannot be equated.

The first method is described in Exhibit 11.2. The second method is easier to derive than the first Levine method or the Tucker method (for linear observed score equating approaches, see Von Davier and

B x Ay= −

Kong, 2005). Due to the fact that the second Levine method is defined for true scores, the computation of means and standard deviations of test scores for the synthetic population can be avoided.

Exhibit 11.2 Levine’s first method: Equally reliable tests

With equally reliable tests, the mean and standard deviation of test forms X and Y are estimated for a synthetic group. Here, we estimate the mean and variance of X and Y for the total group T = A + B. The procedure is illustrated for test form X.

Two assumptions are made with respect to test form X and the common test V. The first assumption is that the true scores of X and V are linearly related. This assumption leads to two equations:

(a)

and

(b)

that is, the intercept and the slope of the relation of the true scores of X and V is the same in groups A and T.

The second assumption is that the variance of measurement errors for test form X is the same in group A and group T,

(c)

Using (a) and (b) and substituting the observed mean for the true score mean, we obtain

Τ Τ Τ Τ Τ

X V X Τ

s s

X V

X V ( )

( )

( ) ( ) ( )

( ) ( ) T

T T A

A A

− = − VV ( )A

s s

X V

X V Τ

Τ Τ Τ ( ) ( )

( ) ( ) T

T A A

sX2( )T(1−rXX′( )T)=sX2( )A(1−rXX′( )A)

x x s

sX v v

T A

A T A

= + Τ −

Τ ( ) ( )

( )

Using (b) and (c), we obtain

The observed-score variance of test form X in the total group can be obtained when the ratio between the true-score variance of X and the true-score variance of V is known. An estimate of this ratio is presented in the main text.

Next, the mean and variance of test form Y in the total group are estimated. Finally, Equations 11.2 through 11.4 are used to equate test forms X and Y.

In the second Levine method, it is assumed that the true scores on X and V are linearly related, and similarly that the true scores on Y and V are linearly related. A true score on test X, TX is equivalent to a true score on test V, TV if the two scores have the same z-score within the same group, say group A, of persons:

(11.5)

This equation can be rewritten as follows:

(11.6) where observed-score means are substituted for true-score means. In the equation, γXV denotes the ratio between the true-score standard deviation on X and the true-score standard deviation on V. This ratio is assumed to be group independent. A similar equation can be obtained for the relation between true scores on Y and true scores on V. Coefficients for the equation relating Y and V can be obtained from group B:

(11.7)

s s s

X X V V

X V

( ) ( )

T A

T T

T A

2 2

= + Τ ( − )

Τ Τ Τ

Τ Τ

X v

X X

V V

−μ = −

μ σ

( )A ( )

(A)

A (A)

ΤX =γXV(ΤV−v(A))+x(A)

ΤV =(ΤY −y(B)) /γYV +v(B)

Substitution of Tv from Equation 11.7 in Equation 11.6 produces the following relationship between the true-score scales of tests X and Y:

(11.8)

Next, raw scores on X and Y are equated as if they were true scores.

The correction for the difference in ability level between groups A and B is one of the differences with Equation 11.2 through Equation 11.4.

The second difference has to do with γ. Angoff (1971) called the ratio of true-score standard deviations γ effective test length. He assumed that test X can be regarded as a combination of γXV tests parallel to anchor test V. Similarly, test Y might be regarded as a test composed of γYV tests parallel to test V. This is a stronger requirement than the requirement that the three tests X, Y, and V have linearly related true scores. With Angoff’s assumption, the coefficients γ can easily be deter- mined. In case test V is included in test X, γXV is computed as follows:

(11.9)

In this case, factor γ equals the inverse of the regression coefficient for the regression of V on X (Angoff, 1953). The coefficient is estimated from responses in group A. The factor γYV is estimated from the responses in group B. If V is an external test, another equation than Equation 11.9 is needed. Standard errors for the Levine procedure are given by Hanson, Zeng, and Kolen (1993).

When the common test V does not have the same function as test forms X and Y, equating is possible when the two groups A and B are random samples, using the method proposed by Lord (1955). If this is not the case, equating is not possible, but it is still possible to obtain comparable scores for test forms X and Y. Scores on X and Y might be defined as comparable if they are predicted by the same score on V.

The definition of comparable scores as scores that are predicted by the same score on a third test is not the only definition possible. There are other definitions of comparability. The issue of comparability of scores is discussed at some length by Angoff (1971, pp. 590–597).

ΤX XV Τ

YV Y y XV v v x

= γ − + − +

γ [ ( )B] γ [ ( )B ( )A ] ( )A

γXV x

v xv

= r s

True score and measurement error

Classical Test Theory and Reliability