Teaching Rant of the Day: auto-grading and sloppy test design
Some problems of digital tools we take for granted in teaching emerge as side effects of usage; others are due to mis-using the tool. Auto-grading and quiz (in)capabilities of modern LMS-es fall into both categories: side effects and encouraging bad test design habits.
Here's what I mean:
Auto-grading byproducts: As with so many features of educational technology, trouble comes from trying to force our practice into the affordances of the product. Auto-grading doesn't work universally on any type of content of course. It works best for situations where there is a one to one matching, where it's a simple lookup. Multiple choice is the easy case. Fill in the blank is a step messier, where you either have to supply many possible answers (anticipating what your students might say that is right enough even if not letter for letter identical to your answer) or specify some degree of fuzziness in the matching (e.g. don't count capitalization). Even fill in the blank phrases or short sentence responses, beyond a single word or two, are bound to add time to a teacher's workload either on the frontend in laying out the questions (and potential responses) or on the flipside when reviewing grades. And in the meantime, students are most certainly going to be annoyed/confused/angry to see a correct answer marked incorrectly because of a technological inability to match complex answers.
Let me illustrate with a use case from a language class that I saw recently. The question asked for a translation of a simple sentence. But of course there are minor variations in translation that are substantively irrelevant but, for a computer, no longer identical answers. The teacher had to go through the quiz answers by hand anyway in order to correct things that were auto-graded incorrectly. In the meantime, the apparently instantaneous feedback to students was suspect until reviewed. Why add the extra step? What has been gained here?
Fostering bad habits in test design: When all you've got is a hammer, everything looks like a nail, right? This is particularly true with auto-graded test builders that are constructed primarily as means to administer multiple choice assessments. Even when given the option of multiple test types, they are all essentially forms of matching a student input to a pre-determined teacher-entered “answer”. That may seem like it is simply the default thing that a “test” does, but if we step back and ask about assessment, well, then, no, this is only one way — a very traditional way perhaps — of doing assessment. It is a way of assessing a particular type of knowledge, usually atomistic, and one not particularly suited to assessing skills or synthetic knowledge. Done well, multiple choice questions can serve a wide variety of uses and I wouldn't deny that carefully thought out assessments of this sort can do a lot more than simply have students regurgitate some content. The problem is that there is a lot of work that has to be done, a lot of intentional and careful planning, that turns complex questions and issues into something that can be assessed by multiple choice tests.
To continue the case of the language class. For a matching exercise, the autograder had been set up with a one to one key. Each item had exactly one answer. But for this content (Latin declensions), many items had multiple possible answers. So students again could put down a correct answer and have it marked wrong. At that point, how exactly does the time invested in wrangling the quiz into this digital form have a benefit over asking everything as an open-ended question evaluated by the teacher. I suspect that this particular teacher would have spent less time overall by simply asking the questions and having students submit a list of their answers on paper, in a text document, or as text fields in the online quiz.
Fostering bad habits in students: Students get used to the idea that their test-taking software is unreliable and prone to errors, that they can do things right and have it counted as wrong. It also puts teacher errors right in their face. Nothing good comes of that, not simply because of undermining trust at all levels, but also because it gets students into a habit of failure. It may not be their failure, but it reinforces the idea that the end result of any exercise is some sort of not working.
Here I would take an example from music teaching. I've watched plenty of younger students learning an instrument struggle with a passage. (And, ok, I have done this plenty too.) They keep going too fast or simply blundering around with a complex bit. But they play it messed up, maybe curse at themselves, and then move on to the stuff that sounds better. As a very good teacher of mine pointed out, that means you're just practicing what it's like to play the passage badly. You want to practice it going well as much as possible, even if that means doing it at an incredibly slow speed or doing it in some very small bit. The lesson is simple. Don't practice your mistakes. That just gets you used to doing it wrong. Fix the mistakes by taking the smallest part that you can do and doing it slowly. Then repeat it, speed it up, and eventually enlarge it. Practice what it feels like for things to work.
Frictionless technology I suppose that's my concern with auto-grading: the promise of frictionless technology followed always by frustration of the technology in practice. Certainly the hype doesn't live up to reality. But there are other effects of now ubiquitous technologies like auto-grading and online quizzes. These are all forms of technological friction, forms that are difficult to control and alleviate. Maybe these frustrations are so obvious that everyone just takes it for granted. But that's part of the problem. Using such tools is an expectation, a norm, a habit. We need to question that norm and habit at every turn, not because of some luddite impulse to throw it out, but because it needs to be used in more thoughtful ways or, for many cases, not at all.