K.26: A Theory of Ethics for Writing Assessment: Risk and Reward for Civil Rights, Program Assessment, and Large Scale Testing

Reviewed by Katrina L. Miller, University of Nevada, Reno, NV (katrinamiller@unr.edu)

Speakers: Norbert Elliot, New Jersey Institute of Technology, Newark, NJ, “A Theory of Ethics for Writing Assessment”
Mya Poe, Northeastern University, Boston, MA, “Civil Rights and Writing Assessment: Societal Action as Validation”
Bob Broad, Illinois State University, Normal, IL, “Gullibility and Blindness in Large-Scale Testing”
David Slomp, University of Lethbridge, Alberta, Canada, “Writing Program Assessment: Consequences as an Integrated Framework”

Respondent: Doug Baldwin, Educational Testing Service, Princeton, NJ

Over the last several years I have been pleasantly surprised by an uptick in CCCC sessions focused on not just local writing assessment practices, but on writing assessment theory more broadly. This year I was impressed by many original and provocative arguments positing new directions for writing assessment theory. One panel stood out for its pertinence to pressing and complicated questions about the potential harm of writing assessment. In “A Theory of Ethics for Writing Assessment: Risk and Reward for Civil Rights, Program Assessment, and Large Scale Testing,” panelists offered a rich and challenging set of social justice frameworks that help develop an overarching theory of writing assessment ethics. While writing assessment theory has come a long way in terms of coherency, the edges begin to fray when we begin to ask questions about the ethics of assessment. Attentive to this fraying, each speaker in this session explored a unifying theory for ethical writing assessment informed either by pre-existing frameworks outside the field of writing studies (such as Civil Rights legislation) or by more familiar disciplinary understandings about the importance of fairness.

Both Norbert Elliot and Mya Poe referenced their 2014 CCC article that described and advocated for what is known in the legal field as disparate impact analysis, a method for proving unintentional inequities in a practice or policy by blending both quantitative information and contextualized reasoning (Poe et al., 2014). Elliot’s presentation provided a useful and thorough theoretical overview of ethics. Foregrounding the consequences of assessment, he argued, enables us to see the moral, intellectual, and practical impacts within specific contexts. The most significant takeaway from Elliot’s presentation was that whatever theory of ethics one adopts or develops, such a framework is absolutely necessary so that we are not blind to the implications of our assessment practices.

Poe’s presentation further explained disparate impact analysis as an elegant and simple method for proving a test is unintentionally discriminatory by pairing statistical evidence with evidentiary claims. Citing Title VI and VII of the 1964 Civil Rights Act, Poe argued that the disparate impact analysis is uniquely attentive to how current opportunity is limited by past discrimination. Poe concluded that by blending best practices of localism, empiricism, and reflection, disparate impact analysis offers a means of linking consequence and action in order to make assessments more ethical.

Pivoting from the considerations of test takers’ experiences to those of test designers, Bob Broad argued psychometricians are necessarily blind to the educational consequence of standardized testing because their livelihoods depend upon the belief that testing is at worst neural and at best a positive force on education. Framed by a powerful Upton Sinclair quote—“It is difficult to get a man to understand something, when his salary depends on his not understanding it”—Broad’s critique explored how professional codes of conduct for education testing elides consequences. Specifically, Broad contended that Educational Testing Service’s and Pearson’s professional codes of conduct include no mention of educational consequences, which he sees as a glaring and telling omission.

Finally, David Slomp focused on the social consequences of assessment by arguing there is a gap between validity as a theory and validity as a process. To bridge this gap, he presented a revised version of his validation framework from a 2014 Research in the Teaching of English article (Slomp et al., 2014). Slomp’s protocol involves five related processes: defining the purpose and context, defining assessment design, defining scoring procedures, interpreting assessment scores, and assessing consequences. Each of these processes can be further broken down to essential questions about the design and use of an assessment. For example, defining the purpose and context for an assessment must include defining the construct being assessed. For example, if one were to define the construct of effective writing as including facets such as evidence that the writing was developed through stages of drafting and revising, an original and well-developing central idea, a logical textual structure, and few (if any) sentence-level errors, then a valid assessment must account for and score all these facets or risk construct underrepresentation. Like Poe, Slomp offered a robust and systematic approach for considering the effects of assessments. Answering a series of test-centered and context-oriented questions, he argued, is a means of approaching validation as a process of structured inquiry more sensitive to consequences than previous validation models.

Doug Baldwin of Educational Testing Service served as the respondent for the panel. Although I anticipated a more defensive response (especially in light of the social justice theme of the panel and Broad’s pointed critique of the testing industry), Baldwin’s comments were polite and measured. He read what appeared to be a prepared statement about education as a “peculiar institution” ripe with tensions. He agreed with the panelists that localized assessment practices do not guarantee fairness but disagreed with Broad’s claim that educational testing specialists do not consider questions about assessment consequences. To support this claim, Baldwin pointed to his chapter on fundamental challenges in designing and scoring educational assessments in Elliot and Perelman’s 2010 edited collection as an example of educational measurement scholarship addressing testing consequences. In the chapter, Baldwin argued that fairness exists separately but remains closely tied to traditional psychometric concerns of reliability and validity.

Overall, this was one of the richest and most challenging sessions I attended. The presentations represented a promising future for writing assessment theory. The panelists engaged a wide array of theories about ethics and fairness and presented thought-provoking critiques that challenged me to refine my own assessment philosophy to be more sensitive to issues of discrimination and disparate effects on minority students. These panelists embodied the continued development and uptake of theories native to the field of writing assessment rather than educational measurement. In other words, they represented the best of what contemporary writing assessment scholarship has to offer.

References

Baldwin, Doug. (2010). Fundamental challenges in developing and scoring constructed-response assessments. In Norbert Elliot & Les Perelman (Eds.), Writing assessment in the 21st century: Essays in honor of Edward M. White (pp. 327–341). Cresskill, NJ: Hampton Press.

Poe, Mya, Elliot, Norbert, Cogan Jr., John Aloysius, & Nurudeen Jr., Tito G. (2014). The legal and the local: Using disparate impact analysis to understand the consequences of writing assessment. College Composition and Communication, 65(4), 588–611.

Slomp, David, Corrigan, Julie, & Sugimoto, Tamiko. (2014). A framework for using consequential validity as evidence in evaluating large-scale writing assessment: A Canadian study. Research in the Teaching of English, 48(3), 276–302.