Introduction to Validity.

Presentation to the National Assessment Governing Board

Gregory J. Cizek, PhD
University of North Carolina at Chapel Hill
McLean, VA
August 2007

Validity is important.
* 'one of the major deities in the pantheon of the psychometrician...'
(Ebel, 1961, p. 640)

* 'the most fundamental consideration in developing and evaluating tests'
(AERA, APA, NCME, 1999, p. 9)

Two Important Concepts
1) Construct
* a label used to describe behavior
* refers to an unobserved (latent) characteristic of interest
* Examples: creativity, intelligence, reading comprehension, preparedness

Construct (continued)
* don't exist -- 'the product of informed scientific imagination'
* operationalized via a measurement process

Two Important Concepts
2) Inference
* 'Informed leap' from an observed, measured value to an estimate of underlying standing on a construct
* Short vs. Long Inferential Leaps (e.g. writing assessment)

Inference (continued)
'I want to go from what I have but don't want, to what I want but can't get.... That's called inference.'
(Wright, 1994)
'is an integrated [on-going] evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores...'
(Messick, 1989, p. 13)

'begins with an explicit statement of the proposed interpretations of test scores'
(AERA, APA, NCME, 1999, p. 9)

'Unitary View' of Validity
* No distinct 'kinds' of validity
* Rather, many potential sources of evidence bearing on appropriateness of inference related to the construct of interest
* All validity is construct validity

Sources of Validity Evidence
1) Evidence based on Test Content
* content validity
* test development process
* bias/sensitivity review
* item tryout; statistical review
* alignment

Sources of Validity Evidence (cont'd)
2) Evidence based on Response Processes
* higher order thinking skills
* cognitive labs
* think-aloud protocols; show your work

Sources of Validity Evidence (cont'd)
3) Evidence based on Internal Structure
* support for subscore reporting,
intended test dimensions
* factor analysis, coefficient alpha

Sources of Validity Evidence (cont'd)
4) Evidence based on Relations to
Other Variables
* Criterion-related evidence
(concurrent, predictive)
* Convergent and discriminant evidence

Sources of Validity Evidence (cont'd)
5) Evidence based on Consequences of Testing
'Tests are commonly administered in the expectation that some benefit will be realized from the intended use of the scores... A fundamental purpose of validation is to indicate whether these specific benefits are realized.' (AERA, APA, NCME, 1999, p. 16)

Some Current Validity Issues
1) Doing it.
'one of the major deities in the pantheon of the psychometrician. It is universally praised, but the good works done in its name are remarkably few.' (Ebel, 1961, p. 640)

Validity Issues (cont'd)
'Validity theory... seems to have been more successful in developing general frameworks for analysis than in providing clear guidance on how to validate specific interpretations and uses of measurements.'
(Kane, 2006, p. 18)

Validity Issues (cont'd)
2) Understanding it.
'For a concept that is the foundation of virtually all aspects of our measurement work, it seems that the term validity continues to be one of the most misunderstood or widely misused of all.' (Frisbie, 2005, p. 21)

Validity Issues (cont'd)
'There is a great deal more in what Cronbach and Messick have suggested [regarding validity] than is acknowledged or accepted by the field.'
(Shepard, 1993, p. 406)

1) Inference
2) Important
3) Iterative evaluation of evidence
4) Issues

How does NAEP establish validity?
What issues will the Board have to deal with to develop the preparedness construct and to measure it?
What are the indicators of good research on validity?
What is the value of face validity in test development?

American Educational Research Association, American Psychological Association, National Council on Measurement in Education [AERA/APA/NCME]. (1999).
Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement, 2nd ed. (pp. 443-507). Washington, DC: American Council on Education.
Ebel, R. L. (1961). Must all tests be valid? American Psychologist, 16, 640-647.
Frisbie, D. A. (2005). Measurement 101: Some fundamentals revisited. Educational Measurement: Issues and Practice, 24(3), 21-28.
Kane, M. T. (2006). Validation. In R. Brennan (Ed.), Educational measurement, 4th ed (pp. 17-64). Westport, CT: Praeger
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd ed. (pp. 13-103). New York: Macmillan.
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19, 405-450.
Wright, B. D. (1994). Introduction to the Rasch model [videocassette]. Available from College of Education, University of Denver, CO.

NAEP Related Resources
For further information regarding NAEP validity, visit these sites: