Wednesday, 13 August 2014

Grade deflation

Results from Princeton's grade deflation experiment are well worth reading. Catherine Rampell reports on it here and here. Princeton aimed to reduce the proportion of awarded As to reduce variability across departments. Some departments responded more vigorously than did others, but mostly in their 100- and 200-level offerings.

At Canterbury, we always had a measure called the Difficulty Index. It wasn't perfect, but it was a lot better than just targeting the proportion of As. It worked as follows. Each student's grade in a particular class was compared to that student's average across all other courses that year. So if a student in my Public Choice class got a B+ but got an A in all the other classes, my class would get a notch up in the difficulty index for that student. Do that for all students in all courses and you have a difficulty index. Where there are enough students who do courses across different departments, you can get cross-departmental measures of difficulty.

Every year, we'd get reports, to the Department, of which courses were more than a standard deviation away from average difficulty, on either side. The Teaching Committee or HoD would have a chat with the lecturer in those courses about it, unless there were some particular reason we expected that course would prove consistently difficult or easy. We aimed to keep our big first year courses right at the middle and didn't mind so much if our core math-based second year theory papers proved difficult. I think one year my Current Policy Issues came up as being too difficult: students in that course earned lower grades than they did, on average, across their other courses. So I adjusted my grading a bit.

I always wished that we could do away with straight Grade Point Average measures and provide instead difficulty-adjusted GPAs. A student earning straight A+s in courses where everybody gets an A+ would then get something like a B unless the students taking those courses also did exceptionally well in their other courses.

I think the Princeton experiment shows that edicts from on high can help a bit, but I wonder whether those gains can be sustained. I also wonder whether it doesn't unduly penalise the really hard courses that are only taken by the brilliant. If everybody in some upper-division crazy-hard math course gets an A+, and they all deserve it because they're brilliant, the Department shouldn't be penalised for that. Strict GPA proportion targeting penalises students in those courses. A difficulty-index would instead show that course to be of average difficulty if the students in those courses also earned an A+ in every other course, and of above-average difficulty if some lower ability students came in and did badly.*

GPA-seekers would flip from looking for easy courses to looking for those courses where they'd do really well relative to other students. That's not perfect, as there are are plenty of students who could be very well advised to take an essay-based course because they're not very good at writing essays and need to learn it (or math courses for the less mathy), but it's surely better than having them seek the ones where they have a relative advantage and where the grading is easy.

These kinds of measures can also be really helpful in setting University-wide scholarships where GPAs across Departments are otherwise not comparable. I'd always worried that our Econ students suffered in University-wide scholarships for graduate study because we kept a much tighter lid on top-end grades than did some other Departments. Difficulty-adjusted GPAs could likely have solved that problem.

And now a bit of chart-porn from the Princeton study. The x-axis has the percentage of As awarded in the early period; y- has the proportion during the deflation period. Economics maintained a pretty low proportion of As in both periods. The top chart has 100 and 200-level courses; the bottom, upper-division.

* We can imagine more complicated versions of the difficulty index where it's adjusted for these kinds of composition effects, so that the difficulty of each of the courses in which a student is enrolled is factored into the calculation of any particular course's difficulty.


  1. How do you normalize for different intake groups? Clearly Princeton has a smarter average student than East Kenucky State...

  2. You don't. Everybody knows that they're entirely different cohorts and that an A+ student from a poor school will be different from an A+ student at a top school.

  3. Agree entirely for situations where you need to compare graduates from different schools. If enough students took courses at different universities, maybe you could run a difficulty index that way, but switchers are too likely to be different from other students for that to be very reliable.

  4. Does the Canterbury difficulty measure take account of possible inverse correlations (e.g. someone who does well at accounting will almost certainly not do well at a creative subject) or is the assumption that a student who is good at one subject is meant to be good at the lot?

  5. And students have an incentive to get the word out if they are at a meaningful-grading school:

  6. Imperfect correlation (including negative correlation) only matters to the extent that there is bunching of what courses are commonly taken together. And then there is a bigger problem, which is that a course is reported as difficult if the students in it are largely from easy grading subjects and vice versa. One of ECON's courses at Canterbury is very popular with Management students, and that leads to some interesting effects on the raw number for difficulty that you can understand only by looking at the more detailed data the university provides on pairwise comparisons between courses.

  7. Interesting post.

    Canterbury's system looks good. It would be better if you could do a Difficulty Index across departments.

    I once tried, with the help of an econometrician colleague, to estimate a Difficulty Index across all departments. Grade,i.j = Ability of student i - Difficulty of course j
    Put all grade data in computer, and listen to the big crunching sound.

    We got estimates, but the standard errors were too large to be really useful. Because, for example, arts students don't take engineering courses, and too few engineers take arts courses.

  8. Has any literature been written about the effects of the Difficulty Index on grade distribution?

  9. Doubt it. Would make for a potentially fun honours project though - also check whether difficulty index changes predict enrolment changes the next year.