Next time around, I will add the cancer-curing magic command ", robust" to my regressions. Hoorah!

Bill: note that there are fewer observations in the lower decile groups too.

The residuals have heterogeneous variances and you can see that they reflect the much larger variance for low deciles in the raw data. There is plenty of variation that needs more explaining and many of us are looking forward to have access to the full data set.

Really nice analysis. Coupla things. The parameters on single-sex are interesting -- they are equivalent to more than a full decile gain at the upper end, right? On the residual plot: are the upper deciles a little tighter than the lower ones (heteroscedasticity)? If so, there's an element of reducing uncertainty in the quality of an unknown good.

Hi Thomas,
I'm hoping to move to more detailed analyses once we get the full dataset. total.roll in the csv file contains the school size and looks like this:

# summary(standards$total.roll)# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's # 11.0 112.0 219.0 270.3 372.0 1978.0 1 

so the typical school is somewhere near 250.

Aha. Could be. Will update.

Excellent work! Can't wait to see what people come up with when we have the full data sets.

P.S. I think "MELAA" stands for "Middle-East, Latin America and Africa", not "Melanesian".

Luis and I used different weightings. I used Stata's aweight command with school's total enrolment because the dependent variable is an average where reporting units are of different size; Luis did something in R that weighted observations by their total enrolment to put more weight on larger schools. So we'll get different results partially from that. I'm using Luis's data though, which does have that school size variable.
Eric,

How does the residual variance compare to the binomial variance? Luis is getting residual standard errors about 1.5 (percentage points) from simpler models, and at p=0.75, that would correspond to n of about 750. Is the typical school really large enough that there is a lot of extra-binomial variation to explain on top of this model?

[What I wanted to do first with the data was some funnel plots to see how much unexplained variation there was, but I didn't because the file didn't have sizes for each school.]

It is an indicator variable for schools that provide residential services not including special schools which were excluded.

Is 'boarding school' just a binary variable? If so it could be very misleading as almost all, if not all, boarding schools take day students as well. For example my high school was technically a boarding school but only around 10% of the students boarded there.