Write Stuff – Robograders

It’s the stuff of every neo-Luddite’s worst nightmare: machines and computers that replace human workers. Computers that can replicate human expertise, and even perform tasks thousands of times faster. In education, robograders seem to fit that bill, but the technology has a long way to go before it gets a passing grade.

The word ?robograders? doesn’t refer to rows of robots sitting at desks and grading papers. Instead, It’s a catch-all term for various computer programs that use algorithms to score students? work. Robograders work fine if You’re grading multiple-choice quizzes, but things get a lot murkier when it comes to handing out grades for essays.

It’s true that essay writing?and grading?has always been rife with subjectivity. Whether It’s the subject matter (Shakespeare’s sonnets, for example), the students who write them, or the teachers who grade them, essays don’t lend themselves to the same type of consistency as, say, the multiplication tables.

And these days even human graders can be expected to churn out essay marks at impossible rates. As a recent New York Times article notes, ?the Pearson education company expects readers to spend no more than two to three minutes per essay.? Working at top speed, those graders might be ?capable of scoring 30 writing samples in an hour.? Not exactly a formula for thoughtful, accurate evaluation.

Still, the typical human grader, whether She’s a fourth grade teacher or a Harvard professor, has something that robograders simply can’t match: the ability to evaluate a student’s ideas.

Suppose you’ve written an essay full of long, complex sentences and hundred-dollar words. Your vocabulary’s great, but your paper doesn’t make a lot of sense. In fact, you’ve slipped in some clearly faulty facts, like World War II starting when Canada invaded New Zealand.

According to that same New York Times article, at least one robograder would probably love it. As the article explains, Les Perelman, a director of writing at the Massachusetts Institute of Technology, tested an error-ridden essay with a program called e-Rater. Among other conclusions, he found that a ?nonsensical? 716-word essay received a higher score than a well-written paper of only 567 words.

The reason? The e-Rater program likes longer essays, whether they make sense or not. It also prefers long words over short ones, meaning that ?gargantuan words are indemnified because e-Rater interprets them as a sign of lexical complexity.?

Now let’s take things to the other end of the spectrum. Suppose You’re a teacher who’s received an essay riddled with spelling errors. The language is well below the student’s grade level. Based on those criteria, e-Rater would likely give it a failing mark. But there’s a core of brilliance in that essay; some innovative thoughts on the global financial system, or perhaps a groundbreaking theory on English literature.

Will the robograder spot it? Probably not. Will the professor or his assistant see the merit in it? Perhaps not, since we can’t deny that the art of essay grading is inherently subjective, and poor spelling often raises red flags.

But we cannot remove that critical part of the process?the evaluation of a student’s ideas, not just the language she uses to express them. Otherwise, we’ve failed before we’ve even begun.

S.D. Livingston is the author of several books, including the new suspense novel Kings of Providence. Visit her website for information on her writing (and for more musings on the literary world!).