Monday 17 January 2011

Finding the birds

Nick Rowe (this time, definitely him) describes a recent bird-hunt: an econometric search for which courses at Carleton were outliers in terms of grading standards.
The basic model is this:

Student i's grade in course j = student i's smarts + course j's birdiness + random error.

I think it's called a "two-way fixed effects model with panel data". We don't observe student i's smarts, so each individual student gets a dummy variable. We don't observe course j's birdiness, so each course gets a dummy variable. That's a very large number of dummy variables, for a medium sized university, even though we did it by department and year-level, rather than down to the level of specific courses.

We got the data (stripped of anything that could put a name on an individual student) and Marcel fed it into a supercomputer, which made a loud crunching sound for a long time, simultaneously estimated every student's smarts and every course's birdiness, then spat out the answers for birdiness. That was the success. The model gave me a numerical estimate of each department's birdiness.

But closer inspection revealed that Operation Birdhunt had failed miserably. The standard errors were very large -- larger than the difference between any two departments' estimated birdiness. So I was unable to say, with any confidence whatsoever, that department X was more birdy than department Y.
He says the problem was too few students who did courses in both Arts and Sciences, so identification was an issue.

Our version at Canterbury is a bit more relaxed. Every student's grades go into the computer. The student's grade in a course, relative to his average grades across all courses at the same level, is taken. So if Joe gets a B average but gets an A+ in Econ 104, then he's scored four points higher in 104: Econ 104 earns a +4. Then, the average of those differences across all students for a particular course is taken and called the difficulty index. Papers that award grades on average a full grade above the enrolees' average in other courses are tagged as "easy"; "hard" courses on average give grades a full grade lower than enrolees' other course grades. Then every academic in the University is forwarded a PDF report listing the outlier papers for those courses with sufficient students. In 2009, our first year macro paper was rated as hard. We were a bit surprised as we'd relaxed the grading standards a bit to keep our intro papers in the middle of the pack.

The method we use isn't as robust as it would be if there were lots of students taking courses from different faculties: if Economics students tend to take courses from other departments with relatively easy standards rather than ones from other departments with relatively harder standards, Econ courses will be flagged as hard more often than they should be. So we don't impose a curve on our outliers. Instead, negative outliers who haven't particular reason for being negative outliers (core courses intended for honours-bound students ought to show up as "hard") note the data and adjust grading to avoid eventual drops in student numbers (and potentially having to put on an extra course to compensate). We haven't many easy graders to worry about.


  1. Interesting, Eric. I like your description of the Canterbury system, and the problem with it.

    Here's my hunch though: even though the Canterbury estimate of birdiness will be biased (because of the correlations you talk about), I think (not sure) that repeated applications of the Canterbury policy should cause an iterative process which converges upon all courses grading to the same standard?

    For example. Suppose most (not all) students in a bird course tended to choose bird courses for most (not all) of their other courses too. The Canterbury estimate of that course's birdiness would be biased down, but will still estimate it to be a bit birdy. As you make all the birds tougher, and re-estimate birdiness, the bias should fall, and it ought to converge over time.

    Does that sound right to you?

  2. In Econ, we use a bit of moral suasion on the one guy who grades light - we don't want to have any birds and we don't mind that some of our courses wind up being hard.

    In the previous funding model within the University, Colleges (Arts, Sciences, Commerce) had incentive to let birdie courses keep going: money followed bums on seats, and if bird courses could attract more bums, funding would follow. So Arts in particular had a lot of birdie low level electives.

    Now we're under capped funding: the government will subsidize only X students for the University as a whole. If one Department gets more students, that attracts more money but at the expense of other Departments in that College. So Colleges have some incentive to clamp down on birds as they make the College as a whole worse off: attracting in students from other Colleges to take bird courses doesn't increase total funding going to the College if the College is already at its ceiling (which I think we all are). So that ought to reduce the number of bird courses in Arts.

    If all that's going on is individual instructor error in grading, then our procedure lets folks iterate to a common standard.

    If it's individual error plus some folks trying to boost their numbers to compensate for other failings in their courses, then our procedure fixes things only if Departments apply pressure internally.

    If it's the above plus Departments trying to game each other within the College's capped enrollment figure, then we also need the College being willing to stomp a bit on Departmental toes to avoid PD games.

    If it's the above plus Colleges that are below their enrollment cap happy to support a few birds to draw in students from other Colleges, then we need the University (via Academic Board) being willing to stomp on toes and shut down birds. I think I've heard of that having happened once.

    If it's the above plus heterogeneous standards across Departments for perhaps more legitimate reasons, then it's tougher again. The quality of the student cohort coming into Econ will be different from the cohort coming into Education. If we each give an A to the top 15%, cohort differences would generate difficulty index differences if students took courses across Colleges.

    I think best practice would be what we have now, plus making the course bird ratings public knowledge via the main University website rather than internal knowledge via the intranet and staff email lists. A more public shaming of the birds I think would help things along: no department would want to be known as the birdcage.

  3. Thanks Eric. Interesting to compare your experiences with ours. Right now, we don't have binding ceilings (we call them "corridors") on undergrad funding.

    Not sure it's a good idea to publicise birds though. It's just free advertising, so they attract even more students!

  4. If students know that employers know which courses are birds...

  5. I proposed a shaming system when I first started at UC, except I didn't know about the official bird rankings. I was just going to organise a student survey, and have the five easiest courses publicised each year in Canta (and the Chch Press of course). The Students Association has a responsibility to its members to tell them which courses are a good deal, after all, and they would be happy to organise a survey. But no need, it seems.

  6. Having a mix of domestic and international students complicates things. Domestic only, I'd be with you. If all students knew that all employers knew which were the bird courses, having it on your transcript would be a strong signal. So few would want to take the birds.

    But those reputation effects wouldn't be felt abroad; we might well expect full fee paying international students coming close to the minimum GPA boundary to flock to the birds (groan). The University might then have to put on some graduating requirement that no more than X credit points come from birds.