Think Computers Can Replace Humans as Test Graders? Think Again.

GETTY IMAGES

Students writing in a classroom

If a robot was grading this article as if it were an essay on the SAT, the perfect opening line would go a little something like this: Computerized robotic technology has been shown to be a highly efficient device to grade standardized examinations, however, when put to the test the mechanized system is easy to thwart.

That’s according to The New York Times, who challenged recent findings that claimed there was little difference between human and robot graders. The Times had Les Perelman, a director of writing at the Massachusetts Institute of Technology and opponent of electronic grading, kick the tires on Education Testing Service’s e-Rater, which the service says it uses in conjunction with human essay readers. (E.T.S., which develops the standard GRE graduate school entrance exam, was the only provider that allowed Perelman to test-drive their electronic grader.)

Among other faults, Perelman found the e-Rater is not capable of telling truth from fiction, so there is little incentive for test takers to get their facts straight. (In one test, he responded to a question about high college tuition rates by blaming greedy teaching assistants, who “receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.” He got the highest possible score.) The robotic system also prefers long essays, long paragraphs, long sentences and long, complex words. It doesn’t like sentences that start with “and” or “or,” but it does favor words such as “however” and “moreover.” All told, it’s a pretty easy system to game. “Once you understand e-Rater’s biases it’s not hard to raise your test score,” Perelman told the Times.

Perelman’s findings are in response to a large-scale April 13 study from researchers at the University of Akron that found human readers and software programs gave roughly the same rating to some 22,000 essays written by high school students. While computerized systems are undoubtedly the most efficient means of evaluating essays—a human grader is only capable of grading a maximum of some 30 essays an hour, while the e-Rater can mow through 16,000 essays in 20 seconds—when it comes to college admissions, it seems the quality of the writing should probably matter more than the word count.

Kayla Webley is a Staff Writer at TIME. Find her on Twitter at @kaylawebley, on Facebook or on Google+. You can also continue the discussion on TIME’s Facebook page and on Twitter at @TIME.