The basic model is this:He says the problem was too few students who did courses in both Arts and Sciences, so identification was an issue.
Student i's grade in course j = student i's smarts + course j's birdiness + random error.
I think it's called a "two-way fixed effects model with panel data". We don't observe student i's smarts, so each individual student gets a dummy variable. We don't observe course j's birdiness, so each course gets a dummy variable. That's a very large number of dummy variables, for a medium sized university, even though we did it by department and year-level, rather than down to the level of specific courses.
We got the data (stripped of anything that could put a name on an individual student) and Marcel fed it into a supercomputer, which made a loud crunching sound for a long time, simultaneously estimated every student's smarts and every course's birdiness, then spat out the answers for birdiness. That was the success. The model gave me a numerical estimate of each department's birdiness.
But closer inspection revealed that Operation Birdhunt had failed miserably. The standard errors were very large -- larger than the difference between any two departments' estimated birdiness. So I was unable to say, with any confidence whatsoever, that department X was more birdy than department Y.
Our version at Canterbury is a bit more relaxed. Every student's grades go into the computer. The student's grade in a course, relative to his average grades across all courses at the same level, is taken. So if Joe gets a B average but gets an A+ in Econ 104, then he's scored four points higher in 104: Econ 104 earns a +4. Then, the average of those differences across all students for a particular course is taken and called the difficulty index. Papers that award grades on average a full grade above the enrolees' average in other courses are tagged as "easy"; "hard" courses on average give grades a full grade lower than enrolees' other course grades. Then every academic in the University is forwarded a PDF report listing the outlier papers for those courses with sufficient students. In 2009, our first year macro paper was rated as hard. We were a bit surprised as we'd relaxed the grading standards a bit to keep our intro papers in the middle of the pack.
The method we use isn't as robust as it would be if there were lots of students taking courses from different faculties: if Economics students tend to take courses from other departments with relatively easy standards rather than ones from other departments with relatively harder standards, Econ courses will be flagged as hard more often than they should be. So we don't impose a curve on our outliers. Instead, negative outliers who haven't particular reason for being negative outliers (core courses intended for honours-bound students ought to show up as "hard") note the data and adjust grading to avoid eventual drops in student numbers (and potentially having to put on an extra course to compensate). We haven't many easy graders to worry about.