Grading. In Norway, final exams are marked by a commission of at least two faculty members. Students who are dissatisfied with their grades can appeal and have their papers remarked by a new commission. In such cases, the new commissions are not made aware of the original grade.
The Ministry has expressed its dissatisfaction with this system, after several high-profile cases in which students received substantially different grades from the two commissions. At least two students originally given a B were given an F the second time around, while two others originally were given Fs but then the new commission upgraded them to As! The system for appeals, the Ministry claimed, was not working as intended.
It may not be working as intended, but it certainly is working as anyone familiar with the research could have predicted. Professors carrying out the grading — all of whom are experts in the relevant area — just do not agree on the quality of exam papers. Whether the paper is good or bad is not something that simply must be seen and recognized; experts vary in their opinions about quality. Research on the Norwegian grading system is both clear and disturbing on exactly this point. In one example, the rate of agreement on whether an exam should pass or fail was about at chance. Grading is not objective and the quality of an exam is not simply a matter of acknowledging the obvious, i.e. of knowing it when you see it.
Peer review. Peer review is well-established internationally as the hallmark of quality control in science. When we think our work is ready to share with our colleagues, we submit an article to an academic journal where editors determine whether or not it should be published in part by administering a peer review process. The submitted article is sent out to internationally leading experts who evaluate it and report back to the editors with their frankest assessment of the quality of the piece.
I would wager that every researcher who has ever submitted an article to a serious journal has had the experience of receiving conflicting reviews — one reviewer may conclude that the paper is almost good enough to be published (because it’s never completely good enough!) and a second may say it’s nowhere close. Indeed, as a former editor of one of the leading journals in my field, I’m sure I am not alone in claiming that a core editorial task is to determine how to weigh or reconcile radically divergent reviews.
These reviews, as noted, are carried out by leading researchers in the relevant field. Why don’t they agree? Quality just isn’t obvious; how one weighs different factors leads to different assessments.
Bias. The research on grading and peer review should not be surprising. Nobel laureate Daniel Kahneman built his career researching the topic of bias, which he wonderfully presents in his book Thinking, Fast and Slow. Radically oversimplifying, his research shows us that decisions and assessments are always made under the influence of context and accumulated experience. This is bias. It affects everything we do.
Grades assigned to papers are influenced by the quality of the papers read just before it. The same is true of our evaluation of grant applications. Manuscripts of scholarly articles are judged in light of our own views of the subject and perhaps our own discoveries. This is natural; it’s what humans do. We are simply not capable of objectivity. And if the idea behind «I know it when I see it» is that subjective evaluations all will point in the same direction, well, that is a misunderstanding of what subjective means.
Research on bias is nearly as ubiquitous as bias itself. The claim that determinations of quality are exempt from bias, rejects the conclusions of a substantial research literature. Yet this is precisely what a slogan like «I know it when I see it» does.
The perfect child. While it may be difficult to define quality, a substantial body of research makes it clear that we don’t categorically recognize it when we see it. Indeed, experts routinely show significant variation in the level of quality they attribute to a particular work. If the goal of the coming white paper on quality is to define quality then use that definition to manage institutions, the research I’ve pointed to here must be considered.
When it comes to the ways we educate our students, there’s no reason to anticipate consensus. Indeed, the suggestion that we all will agree that someone else’s educational program is of high quality — perhaps even higher than our own — reminds me of the oft-cited proverb «There is just one perfect child, and every mother has it».
Kronikkserie om kvalitet
I forbindelse med Kunnskapskonferansen 2. juni, samarbeider Khrono og Høgskolen i Oslo og Akershus (HiOA) om en kronikkserie om kvalitet i høyere utdanning og forskning. Først ut her er rektor på HiOA, Curt Rice.
Onsdag 25.05: Kristin Vinje, stortingsrepresentant, Høyre
Tirsdag 24.05: Ragnhild Lied, leder i Unio: Samspel for betre kvalitet
Mandag 23.05: Curt Rice, rektor HiOA: Nei, kjære minister, kvalitet er ikke som porno
Les mer om konferansen.