Summary: What is the meaning and the value of a grade? Grading students in the workplaces may rather be a currency exchange than something that reflects psychometric validity.

Description: I have always wanted to understand grading in a way that every student receives the academic grades that he or she deserves. I have given teacher courses on student assessment for preclinical classes for decades and much boils down to this one objective, translated to reliability and validity. Countries differ in their grading habits. While Anglo-Saxon countries use the well-known letter system, the Dutch are ingrained to their bone marrow with a scale running from 1 to 10 (1). I have always found such an interval scale more logical than letters that suggest a nominal scale. Indeed, many Dutch schools use halves or even one or two decimals to suggest a very accurate and very interpretable score and we have a wide spread habit of statistical calculations with our numbered grades: averages, standard deviations, correlations, and more advanced statistics. Granting ‘cum laudes’ is important for students’ careers and is based on such calculations, e.g. no grades below 7.0 and an average across all years of 8.0 or higher; an average of 8.0 across national high school exams for instance offers a guaranteed access to medical school without any further assessment. (2)

Over time however, I have become more cynical about the fairness of grading, and now consider grading rather a ritual that is necessary for teachers and schools to organize education and regulate the conditions for the progression of student, as well as a tool to make students study. Don’t get me wrong: these rituals are laudable causes for our grading habits. But I have increasing trouble with explaining the psychometric fairness of any grade to a student.

Let me illustrate this by citing a recent discussion at an education colloquium at our school. Colloquia are regularly held to engage teachers actively in discussions about educational topics. Talks at this well attended colloquium -over 50 faculty were present- were about grading for research terms. Students at our school must spend 12 weeks or more on a research project and complete a written report, sometimes resulting in an article for a scientific journal. The presenters talked about the sub-competencies that should be valued and scored, eventually resulting in two grades, one for the quality of work during the research term, and one for the quality of the report, to be combined into one final grade. At one moment the discussion focused on valuing students handing in their research report many weeks after the finish of the research term. Some in the audience strongly supported the opportunity to further work on the report to optimize it, in pursuit of a higher grade. Shouldn’t students be valued for this extra effort? Others found this not fair compared to students who complied with course rules: if students need more time, wouldn’t that show that they are less competent? The presenters also showed the results of a questionnaire among students. One item was what they thought affects their grade most. Interestingly, ‘working hard’ during the research term was mentioned most as a factor students expected would optimize their grade.

One of the conclusions at the colloquium was that we really do not know what a grade represents. Psychometricians like to think of grades a representing the best possible measurement of the current level of knowledge or cognitive ability of an individual. Yet in such courses, clearly this ‘measurement’ is at least blurred with a representation of effort spent, rather than cognitive ability. Is a student who worked hard on a report a better researcher? Do we actually measure research ability either by looking at the quality of a report, completed with much guidance of a supervisor, or valuing the ostensible presence of a student at the lab sitting behind a screen for many hours?

A grade is like currency. In exchange for what a student does, a grade is awarded. Monetary currency does not just represent the quality of a product, nor just the hours of effort spent to make it, but also what the market is prepared to pay for it, given its scarcity. Maybe a grade too represents different facets of the educational process.
Grading students in a group with a uniform written test suggests that we can objectively measure knowledge, particularly if we use automatically scorable items such as in multiple-choice tests. As long as we all agree this works, I have no problems with this suggestion. But as soon as we start grading individual students for individuals tasks, with individual raters, grading becomes ritualized, and fraught with difficulties that psychometricians can hardly cope with(3). They will ask for large numbers of ratees, raters covering different ratees, standardization of tasks and contexts and they will apply generalizability theory to estimate reliabilities. This is fine as long as we can organize such requirements, but usually we can’t. We often give grades based on few students, few raters, few observed tasks and few or only one context. The ritual of workplace-based assessment has recently been called a “socially situated interpretive act” (4) and I would add that the grading as a result of this interpretation of skill often resembles the exchange of currency. In practice, a grade is also a ‘present’ to reward behavior, reflecting interpersonal valuations that extend beyond any form of objective assessment of the construct of cognitive or other ability.
In the clinical workplace, or in the research arena, we should maybe just value that students can work unsupervised. There are many conditions that must be met before learners should be trusted to do this, including specific knowledge and skill, sense of responsibility, a sense of when to ask for help when needed and willingness to do this, the social skill to smoothly collaborate with others and more things, depending on the task. Giving a single grade for workplace performance may not be very useful and some authors recommend abandoning grades (5,6). Others recommend replacing grades by the amount of supervision required for an individual.(7) That makes more sense than a grade.

References 1. Anonymous. Academic grading the the Netherlands [Internet]. [accessed 6 March 2015]. Available from: http://en.wikipedia.org/wiki/Academic_grading_in_the_Netherlands 2. Ten Cate O. Medical education in the Netherlands. Med Teach. 2007;29(8):752–7. 3. Albanese M. Challenges in using rater judgements in medical education. J Eval Clin Pract. 2000 Aug;6(3):305–19. 4. Govaerts M, van der Vleuten CP. Validity in work-based assessment: expanding our horizons. Med Educ. 2013 Dec;47(12):1164–74. 5. Hanson JL, Rosenberg AA, Lane JL. Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States. Front Psychol. 2013 Jan;4(November):668. 6. Dannefer EF. Beyond assessment of learning toward assessment for learning: Educating tomorrow’s physicians. Med Teach. 2013 May 3;35:560–3. 7. Weller JM, Misur M, Nicolson S, Morris J, Ure S, Crossley J, et al. Can I leave the theatre? A key to more reliable workplace-based assessment. Br J Anaesth. 2014 Mar 17;1–9.