RELIABILITY AND VALIDITY: A BRIEF OVERVIEW- 123docz.net

As just noted, the technical description of each assessment instrument or method includes data on two key psychometric properties: reliability and validity. Because major clinical recommendations such as child custody are often based on clinical assessment, it is essential that the assessment instruments and methods on which these recommended courses of action are made are highly reliable and valid. It is also necessary that the clinician be sufficiently apprised of the appropriate use and limitations of such devices.

Even though this is a text on assessment, it cannot be assumed that all readers will be sufficiently familiar with these two key psychometric properties to appreciate the technical discussion on the various inventories and

TABLE 1.1 Format for Instrument Description

Name of assessment instrument or method

Indicates the authors or developers and date of publication and revisions, if any, of the instrument or method.

Type of instrument Indicates type and form of the instrument or method, e.g., self-report, observational, clinician-rated, standard psychological instrument, outcome measure, etc.

Use–target audience Specifies main use and targeted client, i.e., couple, family, parent–child, child custody, etc.

Multicultural Indicates cultural applicability of the method; specifies other available language versions besides English that are available.

Ease and time of administration

Characterize the ease of administration for the instrument of method, i.e., easy, complicated, etc.; provides number of items and average time needed to complete the instrument; indicates if a manual is available.

Scoring procedure Specifies the (a) method, i.e., paper and pencil, etc.; and (2) average time to score the instrument; indicates if electronic or alternative form of administration or a computer scoring option is available, that information is indicated.

Reliability Specifies types of reported reliability coefficients, such as test–retest, internal consistency, i.e., Cronbach’s alpha, and inter-rater (or scorer) reliability if applicable; also gives overall assessment: high, moderate, average, or below average reliability; if none reported, say: “no published or reported reliability data.”

Validity Specifies the types of reported validity, i.e., construct related, criterion related, if none reported, say: “no published or reported validity data.”

Availability and source Provides information on availability and source of the instrument or method; if journal article provides assessment device, then journal reference is indicated; when only commercially available, name and/

or address/phone number of supplier is indicated.

Comment Provides chapter authors with opportunity to share professional evaluations of clinical utility and value of this instrument or method.

ASSESSMENTOF COUPLESAND FAMILIES 9 assessment methods adequately. Graduate training programs in clinical psychology, counseling psychology, and family psychology are likely to require formal instruction and experience in assessment that addresses psychometric issues. However, other graduate programs, such as marital and family therapy training programs, are less likely to emphasize these concepts. Accordingly, a brief overview of the concepts of reliability will be given as well as descrip- tions and illustrations of various types of each.

Reliability

Reliability is the extent to which a test or any assessment procedure yields the same result when repeated. In other words, reliability is the consistency of a measurement or the degree to which an instrument or assessment device measures the same way each time it is used under the same condition with the same individual. A measure is considered reliable if an individual’s scores on the same test given twice are similar. Technically speaking, reliability is not measured, but is estimated and represented as a correlation coefficient. The higher the reliability coefficient is, the more confidence one can have in the score. Reliability coefficients at or above .70 are considered adequate; those at or above .80 are considered good; those at .90 or above are considered excel- lent (Hambleton & Zaal, 1991). Three types of reliability can be described:

test–retest, internal consistency, and interrater reliability.

• Test–retest reliability—the agreement of assessment measures over time. To determine it, a measure or test is repeated on the same individuals at a future date. Results are compared and correlated with the initial test to give a measure of stability. The Spearman–Brown formula is used for calculating this estimate of reliability.

• Internal consistency—the extent to which tests or procedures assess the same characteristic, skill, or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study. This type of reliability often helps clinicians and researchers interpret data and predict the value of scores and the limits of the relationship among variables. For example, the Family Adaptation Scale (FAD) is a questionnaire to evaluate families in terms of seven functions including communication patterns.

Analyzing the internal consistency of the FAD items on the Communica- tions subscale reveals the extent to which items on this family assessment device actually reflect communication patterns among family members. The internal consistency of a test can be computed in different ways.

• Split-halves reliability—a measure of internal consistency derived by corre- lating responses on half the test with responses to the other half.

• Cronbach’s alpha—another, more sophisticated method. This method divides items on an instrument or measure and computes correlation

values for them. Cronbach’s alpha is a correlation coefficient, and the closer it is to one, the higher the reliability estimate of the assessment device.

• Kuder-Richardson coefficient—another means of estimating internal consistency. It is used for instruments or measures that involve dichotomous responses or items, such as “yes”/“no,” while Cronbach’s alpha is used with Likert-Scale types of responses or items. Finally, it should be noted that the primary difference between test–retest and internal consistency estimates of reliability is that test–retest involves two administrations of the measure or instrument, whereas internal consistency methods involve only a single administration of the instrument.

• Inter-rater reliability—the extent to which two or more individuals (raters) agree. Inter-rater reliability addresses the consistency of the implementa- tion of a rating system. For example, inter-rater reliability can be established in following scenario: Two clinical supervisors are observing the same family being treated by a counseling intern through a two-way mirror.

As part of the observation, each supervisor independently rates the family’s functioning on the GARF Scale. One supervisor rates the family at 62 and the other at 64 (on the 1- to 100-point scale). Because inter-rater reliability is dependent upon the ability of two or more observers to be consistent, it could be said that inter-rater reliability is very high in this instance.

Validity

Validity refers to the ability of an assessment device or method to measure what it is intended to measure. Whereas reliability is concerned with the accuracy of the assessment device or procedure, validity is concerned with the study’s success at measuring what the researchers set out to measure. Four types of validity can be described:

• Face validity—concerned with how a measure or procedure appears. Does it seem like a reasonable way to gain the information? Does it seem to be well designed? Does it seem as though it will work reliably? Unlike content validity, face validity does not depend on established theories for support (Fink, 1995).

• Criterion-related validity, or criterion-referenced validity—used to demon- strate the accuracy of a measure or procedure by comparing it with another measure or procedure previously demonstrated to be valid. For example, a paper-and-pencil test of family functioning, the SFI, appears to measure the same family dynamics and functioning as does a related observational assessment, the Beavers Interactional Scales. By comparing the scores of family members’ self-report of family functioning with the therapist’s observational ratings of family functioning, the SFI was validated by using a criterion-related strategy in which self-report scores were compared to the Beavers Interactional Scales ratings.

ASSESSMENTOF COUPLESAND FAMILIES 11

• Construct validity—seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a family researcher developing a new inventory for assessing marital intimacy might spend considerable time specifying the theoretical boundaries of the term intimacy and then operationally defining it with specific test items or a rating schema in order to achieve an acceptable level of construct validity.

• Convergent validity and discriminate validity—two subcategories of construct validity. Convergent validity is the actual general agreement among ratings, gathered independently of one another, where measures should be theoretically related. Discriminate validity is the lack of a relationship among measures that theoretically should not be related. To understand whether an assessment device has construct validity, three steps are fol- lowed. First, the theoretical relationships are specified. Next, the empirical relationships between the measures of the concepts are examined. Finally, the empirical evidence is interpreted in terms of how it clarifies the construct validity of the particular measure being tested (Carmines & Zeller, 1991, p. 23).

• Content validity—based on the extent to which a measurement reflects the specific intended domain of content (Carmines & Zeller, 1991, p. 20).

Content validity can be illustrated using the following example: Family researchers attempting to measure a family structural dimension such as adaptability must decide what constitutes a relevant domain of content for that dimension. They may look for commonalities among several defini- tions of adaptability or utilize the Delphi technique or a similar strategy so that a consensus opinion or conceptualization of family adaptability from a group of recognized experts on the topic can be reached.

Correlation coefficients can be derived for criterion-related and construct validity. Validity coefficients for assessment devices tend to be much lower than reliability coefficients. For example, validity coefficients for the MMPI-2 are about .30.

RELIABILITY AND VALIDITY: A BRIEF OVERVIEW

COMMON SELF-REPORT MEASURES FOR COUPLES

FAMILY SELF-REPORT FOR MEMBERS UNDER AGE 10