Showing posts with label Joel Hernandez. Show all posts
Showing posts with label Joel Hernandez. Show all posts

Monday, 10 August 2020

Education departments are weird

So our Joel Hernandez has completed some more work on what's all going on in New Zealand's school system and an Auckland Uni education prof is mad about it

Oh well. 

Joel's long term project has been to look at differences in outcomes across students and schools, using the administrative data held in the StatsNZ data lab to adjust for a rather broad assortment of things that students bring with them into the classroom. 

Naive league tables will credit, or damn, schools for outcomes that are largely due to differences in the communities that those schools serve. Getting better measures on outcomes, adjusting for the differences across families that we can see in the data lab, helps. 

Our measures don't tell you what's going on in any particular school, but they do let you know whether a school is doing about as well as expected in the current system given the kids it teaches, or whether it's a place that the Education Review Office might want to go visit to see what's going on. It could be that better-than-expected performance in one school has nothing to do with that school's practices but instead has everything to do with an after-school tutoring club the parents set up - for example. 

Earlier, Joel looked at differences across schools to show that most of the difference in public school performance, by decile, disappears when you account for the differences in the families those schools serve. Piles of low-decile schools showed up as top performers, if you run the stats properly. 

The broader project, which will take some time because lab work is onerous and we only have Joel doing this work for us, will extend to a much broader set of outcomes going beyond NCEA. The Ministry of Ed has like 3000 staff; we have a bit over a dozen staff and we only have Joel in the lab. 

I'm keen to know how different schools vary in stuff like Not In Education, Employment or Training (NEET) status in years following high school completion; tertiary completion; salaries a few years after completing high school; crime rates; benefit uptake - there's a lot that can be looked at. But it'll take a while. We start with one set of data matches and build outward from there, adding things on as we go. 

Anyway, Joel's most recent project looks at whether there are differences in outcomes across state, state-integrated, and private schools. Not many kids go to private school in New Zealand, but private and integrated schools dominate the league tables for achieving University Entrance. 

Because of the cost of private schools, they're mostly going to be attended by kids from richer families. We're still looking at outcomes observable in school data available in the data lab. School data includes every student's performance on every NCEA standard they've sat, and whether they've achieved University Entrance. But it doesn't include data on whether they took up options available in some private schools to attend International Baccalaureate classes instead, or to take the Cambridge exams instead of NCEA. So Joel looked at UE as basis for comparison. 

And remember that the broader project will eventually get to a lot more outcomes. Those take time, and we have one econometrician on the job. 

Joel found that state-integrated schools outperformed state and private schools on University Entrance, adjusting for all the family background characteristics observable in the lab. You can't adjust for everything in the lab, but the stuff you can adjust for, like parents' education, will also be correlated with some of the things you can't observe. 

So, for example, the weight and value parents put on education can matter a lot, but you can't observe that in the data lab. If parents who put the highest value on education will both push their kids harder at home, helping them through, and be more likely to select out of public schools and choose an integrated or private school, then you could be unfairly crediting private schools for effects that come from family background. But, at the same time, if, on average, the parents who put the highest value on education also have high levels of education themselves, then you'll have mopped up some of the effects of "parents value education" by controlling for parents' own education. It isn't perfect, but so long as the unobservables correlate positively with the observables, then you've handled some of that selection issue. 

Or, at least, you've somewhat bounded it. Take a very different area: the persistent arguments about whether unobserved confounds drive the J-curve in alcohol and health. If adjusting for all of the observable health behaviours you can find doesn't do much to reduce the J-curve, and those observable health behaviours are real likely to be correlated with unobservable health behaviours, then it isn't plausible that unobserved confounds are driving the rest. Here, adjusting for the kitchen sink of family background reduced the coefficient on state-integrated schools but hardly got rid of it. You'd need the effects of the unobservables that aren't already mopped up by the observables to be as big as the effects of the observables, and to have driven the selection of private over state schools, to knock out the effect of private schools - and you'd need huge effects of unobservables to take out the effect of state-integrated schools. 

Anyway, here's Auckland Uni Prof of Ed Peter O'Connor on it all:

However University of Auckland Professor Peter O'Connor argues that the focus on UE results alone "reduces the complexity of learning to such a narrow construct that it becomes meaningless".

"As a professor of education, I'd fail my master's students on that. It's a false science," he said.

"There is nothing to suggest that going to a private school means you will be happier, lead a more purposeful life, contribute more to the world, have better relationships with your partner or your children. That in fact your life matters for having been lived.

"You might have better connections, even more money, but that isn't much in the grand scheme of life."

It's kinda funny. We never said anything about happiness, leading a more purposeful life, or any of that. We were just looking at the average effects of integrated and private schools as compared to state schools on university entrance - a metric a lot of people still do care about, and the one that it's possible to check in the lab. We will be broadening to more outcomes in future. 

But I guess if you're a student contemplating doing grad work in education that has any kind of econometrics in it, you probably shouldn't pick O'Connor as supervisor. If he doesn't like whatever numbers you get, he might fail you because he didn't understand the study or the methods. Michael Johnston over at Vic's education department would be a way better choice if you wanted to do it in an Ed department. 

Or, perhaps even better, do it in economics. 

Either way, you can even start with all the code Joel's used to run the data matches - we've got it all up in there freely available for anyone else to build on. There are years and years worth of studies to be done in there, and we've only got Joel. It's basically the best administrative dataset in the world for linking high school students' grades, their family backgrounds, and their later life outcomes. Drop us a line if you're considering picking up on any of this in your thesis work - we're always happy to provide a bit of advice. 

Monday, 15 April 2019

Easton on schools

A recent contribution has been from the NZ Initiative’s report Tomorrow’s Schools: Data and Evidence. [EC note: I've updated the link to the Initiative's site rather than Scoop] Unfortunately it is only note of six pages, which does not meet the standards of a research report, so I can hardly comment on the quality or veracity of the findings.

The note observes there are performance differences among schools (it uses NCEA attainment as a measure). No one is surprised that higher-decile schools outperform lower-decile schools by a large margin (on average). However, once the NZ Initiative adjusted for the effect of family background (they dont explain how), they found that the average differences in education outcomes across school deciles disappears. The report concludes that the inequality in education outcomes evident in school league tables is not a result of large differences in school quality, but rather of large differences in family background, particularly differences in parental education.

The NZ Initiative concludes that their ‘research’ demonstrates that the current schooling system is working and should be retained. Maybe; one wants to see the research first, especially as it contradicts the international literature. (I can think of a number of ways one could do the exercise – not all of them would be valid.)

What strikes me is that the NZ Initiative barely observes that the research suggests that the main source of educational inequality (and a whole lot of life opportunities which follow on from it) is ‘family background’, whatever they mean by that. The implications for inequality are hardly explored. As far as I can infer, the NZ Initiative is so besotted with defending the competitive model of schooling it is uninterested in the wider questions of the sources of and policies for children’s opportunities,; issues central to the egalitarian society. That, I think, captures a deep attitude of the elite right; ‘who cares about social inequality providing we are doing all right’.

Indeed there is celebration of inequality when the rich display their wealth. Of course there was inequality in the egalitarian society before 1985, but it was rare for the rich to show it, to display, what Thorstein Veblen called, ‘conspicuous consumption’. After 1985 it became common to flaunt how rich you were.
Our analyst, Joel Hernandez, spent about a year in the IDI lab on this one.

The mission we gave him: start by figuring out how much of the variability in school performance is due to things outside the school's control, like family and student background. Current league-tables could easily mostly be picking up parents' education or income. We need to be able to find the schools that are doing a superb job despite difficult circumstances, so that we can learn from them. The measures out there just aren't up to spec for doing that kind of work.

So he spent the last year merging a ton of administrative data sets and cleaning the data. It is not a small job.* For the population of students who completed NCEA from 2008 through 2017, there's a link through to their parents. From their parents, to their parents' income. And their education. And their benefit histories. And criminal and prison records. And Child, Youth, and Family notifications. And a pile more. Everything we could think of that might mean one school has a tougher job than another, we threw all of that over onto the right hand side of the regressions.

It's student-level observations with a ridiculous number of control variables enabled by the data linkages in the lab.

The point of the exercise wasn't to precisely identify coefficients on each of the independent variables. The point rather was to mop up all of the variation that comes from family circumstances. There's no structural equation modelling here or any attempt at getting at causality among those variables - just a giant reduced-form kitchen sink.

Plus, five hundred or so indicator variables for each of the country's secondary schools.

On the left-hand side - a few measures of performance at NCEA. But that's just the starting point. We're going broader. Employment after graduation. Income after graduation. Progression to tertiary. NEET status (Not in Education, Employment or Training). Benefit uptake. We could even put future criminal activity in there. So far, it's just NCEA though.

After separating out all of the variability that comes from family background, the coefficient on each of the schools' indicator variables tells you the average effect of that school on outcomes.

Our plan had been to put up the big report in July(ish) with all of the method and the first set of results. Then, short reports would follow regularly on different outcomes.

But then the Bali Haque report came out. The report said that there are huge differences in student outcomes across schools, that those differences showed up as differences by decile, that decile differences were inequitable, and that the entire school system needed to be overturned because of it. Currently self-managed schools operating under school boards would be replaced with hubs managing dozens of schools.

There are indeed very real problems in board governance in some failing schools. It's something that features regularly in stories of persistent school failure. But if the justification for abolishing school boards and putting in place a big new governance structure is strong differences in school outcomes by decile, well, we know that that wasn't the case.

So we moved. Because we're a think tank. Forward the short report on the variability in outcomes by decile after separating out the family background effects.

And then Brian thinks we're hiding stuff or trying to downplay the family circumstances, perhaps trying to hide the evidence that big income redistribution schemes are warranted.

It would have been irresponsible of us to put up the coefficients on the other correlates. We have a kitchen sink of variables to mop up effects, not to precisely identify the coefficients on any of them. Putting up all of those coefficients without checking their sensitivity would have been premature. We control for whether the child is from a single-parent household. Whatever the sign on the coefficient, it would feed into culture wars around divorce and the desirability of two-parent families. We control separately for mothers' and fathers' incomes, and mothers' and fathers' educations. Results could feed into arguments around two-earning families. Most of those control variables would fuel one interest group or another. And even if we were sure we had the numbers right, they're still not causal. If you find that kids of well educated parents perform better at NCEA, that doesn't mean you should start giving degrees to parents to boost their kids' chances.

We'll have more in the full report. What we have so far though lends zero support to arguments around redistribution. Parents' education matters a ton. Income - not so much when education is controlled for. But we need to play with it more before we say anything more. If we have to suffer Easton's grouchiness for it, oh well.

But our object here is the exact opposite of Easton's imaginings. If we can identify the schools that do a fantastic job with kids that other schools have a hard time helping, that means the Ministry or ERO could go into those schools and see whether they're doing anything differently from schools that aren't doing such a good job. Sure, it would take a policy change around operational use of IDI. But it is entirely doable. Learning from that can help lift performance for those that too many schools are currently failing.

And there are all kinds of ways of handling it.

Within the current model, you could get reports from the Ministry to every school board telling them where they're doing well, where they're doing poorly, and which schools they might want to learn from (and which might need their help). The reports would help empower school boards that cannot tell whether poor outcomes are because the community is disadvantaged, or because the Principal is failing. And if the data were available to the parents, that could encourage parents to take a more active role in board governance in places where there is underperformance. Both voice within the school, and exist from underperforming schools, could help encourage better performance. And don't pretend that this is bad because the status quo is some paradise where all the schools are doing great and everyone sends their kid to the local school. Right now, parents use worse proxies for school performance and will happily walk by an excellent low decile school to get their kid into a higher decile school that's further away. The local school might be the one who could do more to help their kid. But we can't tell without better data.

Within a hub-based model, the reports provided by the Ministry could help the overarching structure to manage performance among their schools, to send investigators in to figure out why one school is doing particularly well in ways that nobody had noticed before, and to use what they learn to help others. They could use it to test the effects of different kinds of practice on outcomes. In the data lab, what goes on in the schools is a black box. We just don't know. But the hubs might know that one school never shifted to modern learning environments and the other one shifted to them 6 years ago. It could look at whether those kinds of policies had any effect.

Either way, it would also help the Ministry in similar ways. It could help ERO check whether any of their interventions improve student outcomes.

There is just so much that can yet be done with better use of the data we have. I've been pointing to it for years. Nothing's being done about it. The Ministry has a staff of 3000; we have Joel. We don't have time to do all of it. That's one reason we're opening up all our code in the lab for others to build on.

Imagine if every guidance counsellor in every school received a report from the Ministry for every student. The report for each student finishing Year 10 would say "Here are a thousand kids who looked a lot like you 5 years ago, and another thousand who looked a lot like you 10 years ago. Here are the choices they made about paths through school, through to tertiary or vocational training, and their later employment outcomes. Here's what the kids like you who chose a Bachelor's Degree are doing now. Here's what's happened for those who chose vocational training. Many of these choices may never have occurred to you." It is entirely feasible to do this right now. It would take a few months' coding. After that, it's just push-the-button. And it isn't being done. 

What difference could it make for a kid who never considered university a possibility, because of the community she grew up in, to see that other kids with similar academic records had done brilliantly at uni and that they'd better push to do the UE courses? What difference could it make for a kid whose parents were pushing university to see that kids with comparable records did far better pursuing a trade? Better information has to help.

We have a pretty big work programme here on deck. Once Joel's code is up in the data lab, I'll put up a note about it. I've already been in touch with friends back at Canty. One substantial barrier to assigning IDI projects for Masters thesis work is that you spend a year in data cleaning and matching (and just learning your way around) for any big project. You can't dump that on a Masters student without strong risk that the project falls over. But you can assign projects that build on an existing codebase.

Academics won't put up their code because their incentive is the opposite of ours. They'll want to get the vita lines on every possible way of dicing the data after fronting the fixed cost of merging it. Just look at access around the Dunedin Longitudinal survey, or some of the others out there. Tons of publicly funded work locked up for the benefit of those who ran the surveys.

We want ours to be as open as possible within the constraints set by StatsNZ around the data lab. We want way more people using that code base to see what's going on in education. And if any of them find ways of improving the code to improve match rates, even better!

Our work here is a starting point.

* Even worse, it seems to be a much-repeated job, with anyone doing work in the area duplicating efforts. Joel will be getting all of his code up into the StatsNZ wiki for others to build on - the process for getting it in there isn't trivial though.