Author: Kristen di Gennaro, Pace University
In the United States, many students planning to attend college are required to take standardized tests, such as the SAT, as part of the college application process. Currently the SAT includes sections measuring critical reading, math, and writing skills considered necessary for success in college. In 2004, the College Board (that controls and revises the SAT) introduced an essay component as part of the test. Many college writing programs applauded this change, as it supported the position that a test measuring writing ability must actually require test takers to write, rather than simply respond to multiple-choice or short-answer questions about grammar, spelling, and mechanics.
College administrators welcomed the change as well, seeing it as an opportunity to reduce or even eliminate the writing placement exams that incoming students take upon their acceptance or arrival on campus. Indeed, using students’ scores from externally administered admissions tests for internal purposes was immediately seen as both time-saving and cost-effective for colleges. The practice also appealed to those who believe that students are subjected to too much testing. Who would object to this multi-purpose test?
Yet in 2014, just a decade after the SAT essay component’s debut, the College Board downgraded the essay to optional status, meaning it is no longer a required section of the test. If the essay is so useful for college writing faculty and administrators, why would its creators essentially discourage its use? Wouldn’t demand from stakeholders ensure its ongoing success?
When test users multi-task SAT scores, however, they fail to realize that the test lacks validity for these additional uses. Most people understand test validity to mean that a test measures what it claims to measure: A driving test measures driving skills and a writing test measures writing ability. What many people fail to realize, however, is that a test in and of itself does not have or lack validity, but the purposes for which it is applied can be more or less valid. According to the American Educational Research Association, a test’s validity depends upon its intended use. For example, using results from a driving test to award or deny someone a driver’s license is a valid use of a driving test, but using the same results to award or deny high school diplomas would not be considered a valid use. The driving test has little or no validity as an indication of who deserves a high school diploma. A clear-cut case such as this is easy to understand. Less clear are cases where different tests adopt a similar format, such as the SAT writing component and a college writing placement test.
The SAT is a norm-referenced test. This means that scoring the test involves identifying the median, or middle score, produced by a group of test takers. Test takers’ scores are then arranged above and below the median to create a so-called normal curve (hence the term norm-referenced), also called a bell curve given its shape. The curve takes this shape because the vast majority of test takers’ scores fall under the largest area of the curve, slightly above and slightly below the median score, creating the bell shape.
Norm-referenced tests are designed to compare test takers to the norm and to one another. Thus, spreading scores along a normal curve allows for easy classification of test takers’ scores as average, above average, or below average in relation to the group. For this reason, the results of norm-referenced tests are reported in percentiles, not percentages, as percentiles indicate a test taker’s score in relation to the rest of the group. For example, a score at the 95th percentile indicates that the test taker performed better than 95% of the other test takers who took the same test, not that the test taker answered 95% of the questions correctly. In fact, a percentile score says nothing about how many questions a test taker answered correctly or incorrectly, or about the test taker’s mastery of any particular area of knowledge or skill, but only how the test taker compared with the rest of the group.
Norm-referenced tests are useful for making decisions based on group comparisons. The primary purpose of the SAT is to allow test users, including students, guidance counselors, and college admissions officers, to make comparisons across test takers from a variety of secondary schools in order to determine which students are most likely to succeed at certain colleges. The SAT, by design, is not aligned with any particular curriculum, since it must be relevant for test takers from a vast range of programs and schools across the United States.
The primary purpose of a placement test, on the other hand, is to determine where in a specific program students should begin their coursework. Programs following developmentally based curricula, such as mathematics, foreign languages, and writing, may rely on some form of placement testing for incoming students. Unlike nationally relevant tests such as the SAT, placement tests must be closely linked to the curriculum of the specific program for which they were designed. Tests aligned with a particular program or curriculum fall under the category of criterion-referenced tests.
Criterion-referenced tests are useful when test takers need to demonstrate mastery of specific knowledge, ability, or skills. The driving test mentioned earlier serves as a good example of a criterion-referenced test, since test takers have to demonstrate sufficient knowledge of the rules of the road to not create a hazard to other drivers or pedestrians. A test taker’s performance, or score, in relation to other test takers is irrelevant. If a test taker performs better than 95% of other test takers but all failed to respect the rules of the road, no one is awarded a driver’s license. Likewise, if everyone studies and practices for the test and all perform well, they all receive licenses.
The purpose of a writing placement test is to determine which course within a specific program best matches a student’s writing strengths and weaknesses. For a placement test to be useful, it must be aligned with a program’s course offerings. The best way to achieve this necessary alignment is through the development of locally designed placement tools, or tools that are adapted for the local environment.
In many cases, local placement testing involves having students submit writing samples that are then evaluated by faculty members familiar with the local course options. Students might be asked to create their writing samples on-site, in response to writing tasks designed by faculty, or may be required to submit portfolios showcasing a variety of writing samples. More recently, several programs are experimenting with directed self-placement, a type of self-assessment process where students respond to questions intended to raise self-awareness of their own strengths and weaknesses in writing. Students might then select a course based on their self-assessments, or programs might combine students’ self-assessment responses with their writing samples to make placement decisions. So, there are at least two reasons for rejecting SAT scores as a substitute for placement testing. One is that a test’s validity depends on its intended use, and since the SAT was not designed as a placement test, it lacks validity for this purpose. The second reason is that the SAT is a norm-referenced test not aligned with a particular curriculum, while a college writing placement test is a criterion-referenced test with results linked to specific course content.
Many people, especially college writing faculty, interpret the
College Board’s decision to minimize the role of the SAT essay test as an admission that it was a poor measure of writing ability. According to Les Perelman, retired MIT professor and outspoken critic of the SAT essay test since its inception, giving test takers only 25 minutes to read, reflect, and respond intelligently to a topic in writing is an “absurd” task. (See Chris Anson and Les Perelman’s chapter in this book for more on the validity of standardized writing tests.)
While it’s popular to criticize those in the testing industry for creating bad or invalid tests, a test is not inherently good or bad, valid or invalid. Judging the validity of a test goes beyond considerations of format and content. Even a 25-minute writing task can have validity for certain uses. And a test that has the desired format and content coverage is only valid when used for the purpose for which it was created. As soon as a test is used for an additional purpose, its validity is called into question. Thus, rather than blame the College Board for not designing the SAT essay along the model of a criterion-referenced placement test, writing assessment experts should blame those who multi-task SAT scores, misusing them for purposes for which they were not intended. Perhaps the College Board’s decision to downgrade the SAT essay component will prevent further irresponsible uses of their tests.
For a brief and accessible overview of basic concepts in educational assessment, see Gerald W. Bracey’s “Thinking About Tests and Testing: A Short Primer in Assessment Literacy.” For a more in-depth look at academic testing and how educators measure student knowledge, see the Standards for Educational and Psychological Testing, developed jointly by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. The National Council of Teachers of English and Writing Program Administrators also jointly address issues specific to post-secondary education contexts in their “White Paper on Writing Assessment in Colleges and Universities.” Finally, Les Perelman has been very open with his criticism of the SAT, as exemplified in Joanna Weiss’s piece “The man who killed the SAT essay” (The Boston Globe) and “Interview with Les Perelman” by Karyn Hollis (Kairos: A Journal of Rhetoric, Technology, and Pedagogy).
criterion-referenced tests, norm-referenced tests, placement testing, SAT, test use and misuse, validity, writing assessment
Kristen di Gennaro is assistant professor and director of composition at Pace University in New York City, where she also teaches undergraduate writing courses. She specializes in writing assessment and pedagogy with a particular focus on second-language writers. Her research has appeared in numerous academic journals.