Chapter 7 Validity and Validation of Tests
7.6 Convergent and discriminant validation: A strategy
Evidence for construct-related validity can be obtained in quite a number of ways (i.e., using several research designs and corresponding statistical methods for data analysis). These methods in construct validation, proposed already by Cronbach and Meehl (1955) and more specifically by Campbell and Fiske (1959), may be summarized as follows:
• The study of group differences: If we expect two or more groups to differ on the test purportedly measuring a con- struct, this expectation may be tested directly resulting in evidence for construct-related validity.
• The study of correlations between tests and factor analysis:
If two or more tests are presumed to measure some construct, then a factor analysis of the correlation matrix must reveal one underlying factor as an indicator of the common con- struct.
• Studies of internal structure: For many constructs, evidence of homogeneity within the test is relevant in judging validity.
• Studies of change over occasions: The stability of test scores (i.e., retest reliability) may be relevant to construct validation.
• Studies of process: Observing a person’s process of perfor- mance is one of the best ways of determining what accounts for variability on a test (see, e.g., Cronbach and Meehl, 1955;
Cronbach, 1990). In addition, judgment and logical analysis are recommended in interpretations employing constructs (cf.
Cronbach, 1971, p. 475).
• An important elaboration and extension of the study of cor- relations between tests is Campbell and Fiske’s convergent and discriminant validation by the multitrait–multimethod matrix.
7.6.1 The multitrait–multimethod approach
Suppose we have a few different measurement instruments of a trait.
We might expect that these measurements correlate. If there really is an underlying trait, the correlations should not be too low. On the other hand, correlations between measurement instruments for differ- ent traits should not be too high; otherwise it makes no sense to make a distinction between the traits. In many investigations, several traits are measured using the same kind of instrument, for example, a questionnaire. This might pose a problem. When there is a correlation between two traits measured with the same method, we might wonder to which extent the correlation is due to the covariation of the traits and to which extent it is due to the use of a single measurement method. Campbell and Fiske (1959) proposed to use the multitrait–
multimethod matrix research design in order to study the convergence of trait indicators and the discriminability of traits in validation stud- ies. A hypothetical example with three traits and three methods is given in Figure 7.6.
The main diagonal contains the reliabilities. We might call these entries monotrait–monomethod correlations. In the first diagonal
Figure 7.6 The multitrait–multimethod correlation matrix with three meth- ods and three traits.
Method a Method b Method c
Trait 1 Trait 2 Trait 3 Trait 1 Trait 2 Trait 3 Trait 1 Trait 2 Trait 3 Trait 1 r11(aa) r12(aa) r13(aa) r11(ab) r12(ab) r13(ab) r11(ac) r12(ac) r13(ac) Trait 2 r22(aa) r23(aa) r21(ab) r22(ab) r23(ab) r21(ac) r22(ac) r23(ac) Method a
Trait 3 r33(aa) r31(ab) r32(ab) r33(ab) r31(ac) r32(ac) r33(ac) Trait 1 r11(bb) r12(bb) r13(bb) r11(bc) r12(bc) r13(bc) Trait 2 r22(bb) r23(bb) r21(bc) r22(bc) r23(bc) Method b
Trait 3 r33(bb) r31(bc) r32(bc) r33(bc)
Trait 1 r11(cc) r12(cc) r13(cc)
Trait 2 r22(cc) r23(cc)
Method c
Trait 3 r33(cc)
entry, for example, we have r11(aa), the reliability of the measurement instrument which measures trait 1 by means of method a. Adjacent to the main diagonal we have triangles with heterotrait–monomethod correlations. We also have blocks with correlations involving two dif- ferent methods. Within these blocks, we have diagonals with correla- tions involving one trait. These monotrait–heteromethod values are the so-called validity diagonals; a gray background in the figure indi- cates the monotrait–heteromethod entrees.
According to Campbell and Fiske, a validation process is satisfac- tory if the following take place:
1. Correlations between measurements of the same trait with different methods are significantly larger than 0. Then we have convergence.
2. Correlations between measurements of a trait with different methods are higher than the correlations of different traits measured with the same method. The validity diagonals should be higher than the correlations in the monomethod–
heterotrait triangles. In that case, we have discriminant validity.
3. A validity coefficient rii(ab) is larger than the correlations rij(ab) and rji(ab).
4. In the heterotrait triangles of the monomethod blocks and the heteromethod blocks, the pattern of correlations is the same.
Campbell and Fiske considered only informal analysis and eye- balling techniques for the study of multitrait–multimethod matrices.
Such matrices, however, may also be analyzed with generalizability theory (Cronbach et al., 1972). An alternative approach is to use confir- matory factor analysis. It belongs to the class of structural equation modeling (SEM), and is, among others, a promising procedure to obtain evidence of construct-related validation where more constructs are involved in a nomological network. Also, with more than one measure, confirmatory factor analysis with so-called structured means can be used to test hypotheses with respect to the tenability of equivalence condi- tions (e.g., strictly parallel measures, tau-equivalent measures) of a set of measures. Last but not least, this type of confirmatory factor analysis offers a fruitful approach to test validation. Technical details of the analysis of the multitrait–multimethod matrix by confirmatory factor analysis can be found in Kenny and Kashy (1992), while general details on alternative approaches can be found in Schmitt, Coyle, and Saari
(1977) and Schmitt and Stults (1986). The reader should, however, be warned: routine applications of SEM for the analysis of multitrait–
multimethod matrices are doomed to fail due to all the pitfalls in the use of SEM. The lesson from all this is that none of the analytic appro- aches to multitrait–multimethod matrices should be done routinely. A thoughtful and well-balanced review of approaches to the multitrait–
multimethod matrix has been given by Crano (2000).
Test manuals should provide information on reliability, validity, and test norms. The manuals cannot be exhaustive, however. After publication of a test, new research adds to the validation of test uses.
Summaries of research and critical discussions of tests are needed.
The Mental Measurement Yearbooks fulfill such a function. Let us take the Beck Depression Inventory (BDI), a frequently cited inventory. The BDI is reviewed by two reviewers in the Thirteenth Mental Measure- ment Yearbook, Carlson (1998) and Waller (1998). The BDI is a brief self-report inventory of depression symptoms. It has 21 items, scored on a four-point scale. The test is used for psychiatric patients. It also is frequently used as a screening device in healthy populations. The manual gives information on reliability, validity, and test norms. But, the reviewers argue that the manual is too short. Much useful infor- mation must be found in other published sources. Several aspects of validity are discussed by the reviewers. The inventory has face validity;
the items are transparent. The high face validity makes the inventory vulnerable to faking. Correlations with other tests have been computed and a factor analysis has been done. The inventory discriminates patients from healthy persons. Waller notes that the information with respect to discrimination validity is lacking. What is, for example, the correlation of the BDI with an anxiety measure, a measure of a dif- ferent construct?