Monday, May 9, 2011

Rubrics Measuring Quality: Impossible

Some time ago, March 2008 to be more precise, I jotted down some ideas on this topic. I'm sharing them now...

The abandonment of old methods of assessment is well overdue, and the transition to performance-based or authentic assessment is well underway. Educators across the country are getting away from giving tests and are turning to portfolios and long-term projects that are meaningful to students. Of course, we’ve all been taught that an integral part of authentic assessment is the creation of fair, objective, and transparent rubrics, but is that really happening? Can a rubric be objective? What does that really mean, and do we really need them? Are rubrics over-applied and what can they actually measure?
It can be extremely difficult when educators come together to compare and critique rubrics, since most teachers feel their rubrics work for them and so see little need to change them. Teachers of course believe their own rubrics to be fair otherwise they wouldn’t use them. Teachers and administrators also know that one of the primary purposes of rubrics is to take the subjectivity out of grading.
However, the sad truth is most rubrics, even those constructed by the best, most dedicated, and fair-minded teachers, authors, and consultants are often not any less subjective than grading without one. How can a rubric designed to measure success contain the word “successfully?” How can a rubric designed to measure how organized a product is contain the word “organized?”
“A quality essay.” “An astute analysis.” “A comprehensive explanation.” None of these can be objectively measured; they all require a subjective decision on the part of the educator. In fact, for a rubric to be truly objective, it can contain neither adjectives nor adverbs whatsoever.

Consider a rubric, working on a scale of four, containing indicators like these:
4- Paper is very organized and well-constructed
3- Paper is organized and well-constructed
2- Paper is organized but not well-constructed
1- Paper is not organized

Now first, without simply dismissing this rubric above as flawed and different from their own, as teachers so often do, they should look at the inherent problems and then decide if those problems exist within their own rubrics. In the example, the only difference between a “4” and a “3” is the adverb “very.” Can anyone define what “very” means in this case? Can the teacher clearly explain, without leaving any question, what the difference is between organized and very organized? If he or she can, in quantifiable terms, then the teacher needs to include those terms within the rubric, but I suspect the difference is so subjective as to make quantification nearly impossible, and the result of such an effort, silly and forced.
Further, what does “organized” look like? How is it being measured? Even if it’s compared it to a sample considered to be organized, the scorer is still making a subjective judgment. Or for that matter, what does “well-constructed” mean? Both are completely in the eye of the beholder, and therefore subjective and, as we’ve been told, anything subjective may not be fair and one of the primary purposes rubrics were developed, to eliminate subjectivity. If rubrics only add to the problem they’re supposed to solve, then should the use of rubrics perhaps be discontinued?
Even when we can quantify something, and therefore are able to objectively measure it, it doesn’t necessarily mean we should even bother since that which is being measured may have very little to do with learning or thinking. Learning and thinking are key, as opposed to that recent mantra of “does this aid in teaching and learning?” Well frankly, if it aids in learning, that’s enough. Learning and thinking are objectives worth striving for, but the problem is measuring thinking and learning is extremely difficult. We are able to measure whether or not something is included in a paper, and we are able to measure those things we can count and add up.
For example, a rubric containing indicators like these:

4- Paper contains at least 2 paragraphs per topic.
3- Paper contains at least 1 paragraph per topic.
2- Paper contains at least 1 paragraph on most topics.
1- One or more topics completely not addressed.

For a 4 or a 3, the standard is completely measurable, assuming we can agree on what a paragraph is. A paragraph may have as little as a single sentence and still be a paragraph. Sometimes laying out a rubric actually has a chilling effect, and actually shows talented students how little they need to do to get a desired grade, rather than inspiring the lower performing student to reach for new heights. Rubrics might add some clarity as to what is expected, but they also create an intellectual finish line for most of our students to cross and stop. Telling even the most talented marathon runner that the finish line tape is just a suggested stopping point, and she should feel free to continue to push herself and to run another ten miles is a bit asinine. This is another clear demonstration of the tension between the theoretical and the practical in education in general and with rubrics in particular.
In the example rubric above, for a 2, a student would likely immediately ask, “Is the word ‘most’ indicating a majority of the topics have at least a paragraph? Fifty-one percent? 66 percent? 80 percent? How many paragraphs, precisely, constitutes ‘most?’ Give me a number.” The student isn’t wrong to ask this, as the rubric is unclear. However, the more germane question is, “How does this rubric measure learning or thinking?” The question is important because this sample part of a larger set of rubrics does not measure learning or thinking and therefore perhaps should not be used at all. It might inadequately measure effort, which would be different student to student based on the individual student’s ability level, and that’s about it. Of course, if one paper has two 1-sentence paragraphs per topic, and the next paper has two 10-sentence paragraphs per topic, this rubric says each student MUST get a “4,” regardless of the quality of the paragraphs. The teacher has no discretion on this indicator; there can be no “rounding” up or down.
Instead of cheating the rubric with things like giving a 3.4 or 3.6 (what’s the demonstrable difference between those?), we educators should ask ourselves why we feel compelled to add 0.4 to a 3, and what our thought process was.
Returning to the last rubric, the verbiage explaining what results in a 1 is the worst of all. Imagine the student, faced with twenty-one topics, writes 40 paragraphs on 20 of the topics, and the paragraphs written were of such brilliance that they changed the nature of our knowledge of the universe forever, but since topic #21 did not receive its own paragraph, the paper gets a 1 on this indicator. This of course would be ridiculous, and to “bend” the rules is both supposedly unfair and completely negates the purpose of rubrics in the first place.
Sometimes, instead of numbers we give names to the various levels of performance, such as “meets” or “exceeds” and this leads to new questions.

Exceeds- Free from errors in grammar and spelling.
Meets- Almost free from errors in grammar and spelling.
Novice- Only a few errors in grammar and spelling.
Not Yet- Many errors in grammar and spelling.

Here in this rubric we’re telling students that if they actually manage to write a paper free from errors in grammar and spelling, they are exceeding what we expect of them. Is this right? Shouldn’t we be insisting on papers free from errors in spelling and grammar? As for the names of the performance levels, they will in nearly every case be translated into a numerical score of some type; so again, this rubric is about giving the illusion of objectivity and innovation and not really about changing anything substantive. Even with secondary schools that refuse to do the translation into a numerical score, 99 percent of post-secondary schools will have to do the translation.
We also once again cannot forget that this last rubric includes words like “almost free,” “a few errors,” and “many errors.” The only way to make an objective, ergo fair, rubric of this type is to set numbers into the rubric.
Also, how can we use words like “almost free from errors” and not see that as subjective? Finally, this last rubric measures a proficiency in a skill, not necessarily thinking. Which is fine since that was its designed purpose, it functions as designed, but it again demonstrates that perhaps there is a place for rubrics and a place for professional judgment without them.
There are even worse lines in rubrics, such as “Project showed creativity.” There is no way to measure creativity. What is really being scored there? Novelty? The less likely the teacher is to have seen something like the student’s effort, the more points on the creativity scale?
Are rubrics a bad thing? No, my point is not that rubrics are bad. Perhaps we should continue to use them, but we should also admit that using them is just as subjective a process as not using them, or at least nearly so. We should also realize that our reliance on rubrics could hamper some thinking and learning. I’m sure the invention of the power drill was an exciting moment, and the use of the power drill spread like wildfire, but we don’t use them to change light bulbs, do we? Like any other tool, perhaps we need find out what rubrics do best and limit their use. Rubrics should not be applied to anything qualitative, only quantitative.
To try imposing rubrics on the quality of student work is tantamount to trying to write computer software that would grade thinking, innovation, and creativity. It doesn’t work. Counting paragraphs? Fine. How insightful a paragraph is? How creative a project is? Software and rubrics can’t do that. Human educators can make judgment calls, and assign a grade if they must. However, more importantly the teachers can interact and discuss the paragraph or project or portfolio, creating a much more personal learning experience for the student, and continue the learning. Even the most worthwhile rubrics, at best, simply spark that conversation and support it, but at worst and more often, they cripple that discourse, fencing it in. Their efficacy and fairness, especially to measure learning in a summative and objective way, is at best dubious and more likely educational smoke and mirrors.
#####


No comments:

Post a Comment