One-facet designs, the p × i design

5.3.1 The crossed design

Let us have a universe with one facet, the facet items. We assume that the facet is a random facet. For person p, the observed score on item i is Xpi. When we want to generalize over the facet, we must take the expectation of Xpi over items. This expectation deﬁnes the universe score:

μp≡ EiXpi (5.1)

The universe score is comparable to the true score in classical test theory. In generalizability theory, it is assumed that the persons are a random sample from a large population (formally: Np=∞). Analogous to Equation 5.1, we can deﬁne the population mean of item i as

μi≡ EpXpi (5.2)

The expectation of the universe scores is μ. This universe mean is also the expectation of the population means μi.

GENERALIZABILITY THEORY 51

With these deﬁnitions, we can decompose the observed score Xpi into a number of components:

Xpi = μ Grand mean

+μp −μ Person effect (5.3)

+μi− μ Item effect +Xpi −μp− μi +μ Residual

So, the observed score can be written as the sum of the grand mean, a person effect, an item effect, and a residual. The residual can be thought of as a combination of pure measurement error and the interaction between the item and the person. But, for lack of replications, these two sources of variation are confounded.

In Figure 5.2 a Venn diagram representation is given of the p× i design. An advantage of such a representation is that it visualizes the variance components involved in the decomposition of the observed scores. The interaction can be found in the segment where the circles for persons and items overlap.

The population variance of universe scores or person effects is called the variance component for persons and is written as . We also have a variance component for items, , and a residual variance component, . The notation of the residual reﬂects the confounding of the random error and the interaction. The variance of Xpiover p and i, Ep,i(Xpi − μ)2 is

(5.4) The three variance components can be estimated from an analysis of variance (ANOVA) of a two-way design. Statistical packages are avail- able for ANOVA analyses and the estimation of variance components Figure 5.2 Venn diagram representation of the p × i design.

p pi, e i

σ2p σi2

σ2pi e,

σ2(Xpi)=σ2p+σ2i +σ2pi e,

(e.g., SAS, www.sas.com; SPSS, www.spss.com). There also is a suit of computer programs for generalizability analyses, Genova, which can be obtained for free from the University of Iowa, College of Education (Brennan, 2001).

It should be noticed that in the ANOVA terminology, two ways are distinguished: one represents the units of measurements (i.e., the persons in G theory), and the other the facet, items. So, a two-way ANOVA design is equivalent to or rather leads to a one-facet G study.

The observations in a crossed design can be written as in Figure 5.3.

In the rightmost column, we have the averages for persons, averaging over items. In the bottom row, we have the average scores for items, averaging over persons.

In order to compute the variance components for this p × i design, we use the ANOVA machinery. We start with calculating the sums of squares for persons and items, respectively. For the computation of the sum of squares for persons, we replace each entry xpi in the row for a person by the average score for this person. Next we take the squared deviations from the grand mean and sum these squared deviations. The sum of squares for persons is

(5.5)

The mean squares for persons is obtained by dividing the sum of squares for persons by the degrees of freedom corresponding to this sum of squares, np − 1. The mean squares for persons is equal to ni Figure 5.3 Observations in a crossed p × i design with np persons and ni items.

x11 … x1i … x1ni

. . . . .

xp1 … xpi … xpni

. . . . .

xnp1 … xnpi … xnpni xnp.

xp.

x1.

x.1 … x.i x.ni x..

SSp n xi p x

p np

= ⋅− ⋅⋅

∑= ( ) 1

times the variance of the mean scores and is equal to the total-score variance divided by ni.

The sum of squares for items is obtained in a similar way. The simplest way to obtain the sum of squares for the residual is to compute the total sum of squares and to subtract the sum of squares for persons and the sum of squares for items. The complete ANOVA is summarized in Table 5.1. The rightmost column in this table gives the expected mean squares for the random-effects model. In the expected mean squares for persons, all variance components related to persons are included as well as the random error. This is due to the fact that the model is a random model. In a model with fixed effects, the interactions for a particular person would have summed to 0. In the model with random effects, the ni interactions are a random sample from all possible interactions for the person. The coefficient of the variance component for persons is ni. This coefficient is equal to the number of observations in which the person effect is involved.

MSpi,e estimates . From the above table we obtain an estimate of :

(5.6) The generalizability coefﬁcient for the ni-item test—universe-score variance divided by observed variance—is

(5.7) Table 5.1 ANOVA of the crossed p × i design.

Source of Variation

Sum of Squares (SS)

Degrees of Freedom (df)

Mean Square (MS)

Expected Mean Square (EMS)

Persons (p) SSp np − 1 MSp = SSp/dfp

Items (i) SSi ni − 1 MSi = SSi/dfi

Residual SSpi,e (np −1)(ni − 1) MSpi,e = SSpi,e/dfpi,e Total ΣΣ(xpi − x..)2

σ2pi e, +niσ2p σ2pi e, +np iσ2

σ2pi e,

σ2pi e, σ2p

ˆ ( , )/

σ2p= MSp−MSpi e ni

Ερ σ

σ σ

Rel 2

2 2

= +

p p pi e,/ni

also known as the stepped-up intraclass correlation. The expectation sign indicates that in this formula an approximation is given to the expected squared correlation of observed scores and universe scores.

The coefficient is the generalizability counterpart of the reliability coefficient. Its size gives information on the accuracy with which com- parisons between persons can be made. The coefficient concerns rela- tive measurements, and this is denoted by Rel (Shavelson and Webb, 1991). The estimate of Equation 5.7 in terms of mean squares is

(5.8) The mean squares in Equation 5.8 can be written in terms of the total- score variance and the item variances. If we do so, we can derive that Equation 5.8 is identical to coefﬁcient α, the coefﬁcient known as a lower bound to reliability. This implies that in case of congeneric items, generalizability theory underestimates generalizability or reliability.

The problem is due to the fact that the true-scale differences between congeneric measurements are taken up into the interaction term in the score composition (Equation 5.3).

5.3.2 The nested i : p design

In the one-facet i : p design, each person is presented with a different set of items. This situation is schematized in Figure 5.4. It is clear from the ﬁgure that the data matrix is incomplete.

Figure 5.4 Data matrix and Venn diagram for the nested i : p design.

EρRel2 = MS −MS MS

p pi e

p ,

(a) Data Matrix i : p (b) Venn Diagram Items

× ×

× × × ×

2 3 4 5 6

Persons

1 2 3

i pi, e

Only two variance components can be estimated. The ANOVA for the nested design is given in Table 5.2.

With ni items the generalizability coefﬁcient is

(5.9)

which is estimated by

(5.10)

Notice the difference between the left-hand sides of Equation 5.7 and Equation 5.9. In the nested design the ratio of variance components equals the generalizability coefﬁcient. For more on this design see Jackson (1973).

True score and measurement error

Classical Test Theory and Reliability