Offsetting Behaviour: open data

Showing posts with label open data. Show all posts

Thursday, 21 May 2020

Tax attitude data - the IRD OIA slowly progresses

It's been a long saga, and it isn't over yet. But the end is in sight.

On 12 February, 2019, I put in an OIA request for the data from the polling that IRD had commissioned from Colmar Brunton on tax attitudes.

On 12 March, 2019, IRD declined the request on grounds that it would be considered sensitive tax data. I brought the matter to the Ombudsman the next day. But it later turned out that they had ordered the data destroyed.

On 1 November, 2019, IRD gave me revised grounds for having refused the request: that the data had been destroyed.

I had a chat with the Archivist's Office about what's required for that kind of destruction of public records, then went back to the Ombudsman.

On 12 March, 2020, the Chief Ombudsman provided a substantial slap to IRD and directed them to get the data.

Some particularly nice bits from that report:

You have acknowledged that IR’s original decision to refuse the request on this basis was incorrect. I agree. It is difficult, in circumstances where IR has not reviewed the information at issue, to be satisfied that its disclosure would be contrary to the Tax Administration Act.

This is particularly nice because IR claimed its original decision was wrong on the basis that the data had been destroyed and consequently couldn't be supplied. They didn't say that their original justification for withholding the data, had it existed, was incorrect. The Ombudsman here is telling them that their withholding as tax secret was wrong full stop.

IR has contended that, in retrospect, Clause 4.3.2 of its Disposal Authority, DA418, authorises the disposal of this kind of information. It is possible this is correct.8 However, as IR did not document its decision of 9 February 2019, I am unable to satisfy myself that IR properly turned its mind to its Disposal Authority when it decided to instruct Colmar Brunton to delete the political leanings data. It is, in short, not clear that its disposal was properly authorised.

I therefore consider that IR was obliged, in the circumstances set out above, to contact Colmar Brunton to seek access to any backups of the political leanings data held by it or Dimensions while considering Dr Crampton’s request. In the absence of such efforts, I am not persuaded, subject to your further comment, that IR was entitled to refuse the request under section 18(e) of the OIA.

The Chief Ombudsman also referred the matter to the Chief Archivist to check whether IRD was also in breach of its obligations under the Public Records Act.

The Ombudsman asked that, by 9 April 2020, IRD provide him with the steps they would take to give effect to his recommendation.

On Tuesday, 19 May 2020, IRD provided me this update:

Dear Dr Crampton

I am writing to advise you of our next steps with regards to your complaint to the Ombudsman.

As advised by the Ombudsman in his final letter, we requested the data for the Trust in IR research survey from Colmar Brunton. We are currently working through the data and will let you know the result of your request for the data in due course.

Kind regards

Government & Executive Services

So, fifteen months after my original request, IRD says that they're working through the data to see what they can give me. Stay tuned. I'll have to set a calendar flag to follow this up in 20 working days if I've not heard anything.

Tuesday, 29 October 2019

Locked data filing cabinets

Over at Newsroom, I argue for opening up some of New Zealand's locked government data cabinets as part of government's maintaining social licence to collect it in the first place.

Since I've been a bad blogger lately and didn't get this up when it first came out, I'll put the whole thing here rather than just a snippet. Enjoy!

When Arthur Dent complained that he had not been informed of Council’s plans to bulldoze his house for a bypass, Mr Prosser, the Council officer, calmly told him that the plans had been on display for months - in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’. Arthur found the plans there the day before the bulldozers showed up at his door.

A lot of New Zealand’s government data feels about as inaccessible as the council plans that Arthur Dent eventually found in the classic Hitchhiker’s Guide to the Galaxy. And, just like those ‘open’ council plans for Arthur’s house, government data can have consequence in the real world. Opening the data up could do rather some good in building trust and enabling better policy.

Last week, InternetNZ held its annual NetHui at Te Papa. The event brings together a motley assortment of tech geeks and aficionados, policy wonks, social justice warriors and tech law experts. The sessions at NetHui can be rather wide-ranging, and you get the chance to chat with a lot of people you might not otherwise run into.

One theme running through rather a few sessions was a feeling that Kiwis are subject to Big Data rather than participants in it who are enabled to use data to help them to understand and shape the communities in which they live. That feeling builds resentment of the use of data in policymaking.

And it isn’t hard to understand why.

The formal barriers to accessing a lot of government data are rather substantial. New Zealand’s most sensitive data is held in secure data labs under highly restricted access. Researchers wanting to use it must prove that they have sufficient training in statistics and can only use the data within approved research projects. Very sensitive personal data held in the data lab, even though it is anonymised, has to be well-protected.

Even confidentialised random samples from larger datasets, where identifying details have been stripped out, might as well be behind signs warning to Beware of the Leopard. Access to that data is rather difficult.

But even for data that is properly open, data literacy can be a substantial barrier to participation. You have to know what data you need for any particular question, where to find it, how to get it, and then how to analyse it. Not everyone has a copy of Excel, let alone more powerful statistical software like Stata (expensive, powerful and simple to use) or R (free, powerful, and with a big learning curve).

The combined barriers mean that, for a lot of people, government data is something that’s done to them rather than something they can really use. It is even true within government. Our think tank has done a fair bit of work on education; a fairly regular, and accurate, complaint from school leaders is that while they spend countless hours in submitting data up to the Ministry of Education, they receive little back from the Ministry that could help them in improving their schools. The view from those outside of the system can be bleaker.

It all makes for a difficult problem. Social licence for data like that held in the Statistics New Zealand Integrated Data Infrastructure would disappear if anyone’s personal details were ever compromised. But social licence for that data can also disappear when the people whose details make up that data see little benefit in it, are locked out of it.

But there is a way through it.

Well over a decade ago, when I was Senior Lecturer in Economics at Canterbury, I assigned projects using sensitive microdata. The students in my course on Public Choice had check whether survey respondents’ political policy preferences tended to line up with their personal interests and whether, in the broad, policy tended to match public opinion.

We were able to do that, using American General Social Survey data, because of a wonderful web interface hosted by the University of California at Berkeley. The survey includes a lot of very sensitive personal questions, ranging from income and health to sexuality and policy preferences. Berkeley’s web interface allows anyone in the world to run simple statistical tests without ever having to see the confidential data that sits in the database. The students could check whether survey respondents’ income, or those respondents’ education, or their scores on a vocabulary test, were stronger predictors of policy preferences.

But nothing comparable exists in New Zealand, even a decade later. Anyone wanting to run even simple checks on important policy questions in must jump through hoops impossible for most people to hurdle. Meanwhile, over 100 other countries have signed up with the University of Minnesota’s Integrated Public Use Microdata Series (IPUMS), which provides the same kind of simple web tool as Berkeley to enable people to use their own data. It is easier for most Kiwis to access and use other countries’ data than it is to access their own.

That builds resentment among those who, unfortunately correctly, see themselves as having data done to them rather than something that enables their own civic participation. When only the anointed few are allowed the key to the proverbial locked filing cabinet in the basement, and the plans that they there work on matter for policies that affect peoples’ lives, it is not that surprising that the subjects of that data would prefer to layer on even more controls restricting access, or blow up the filing cabinet entirely.

More substantially opening up New Zealand’s locked data filing cabinets to enable people to use their own data would not just help ensure social licence for that data, it would also strengthen government accountability. When anyone with a web browser can run simple checks on whether changes in policy improve outcomes, it is easier to avoid the kind of surprise that Arthur Dent found lurking in the planner’s disused lavatory.

Thursday, 6 December 2018

Morning roundup

The morning worthies:

Why do academics resent markets? Nozick was right.

Chris Ruhm on shackling the identification police. No, this isn't about the TERF wars. It's more important than that. Ruhm worries that the need for clean identification constrains the questions empirical economists now investigate. I worry that it also means multi-year lags in getting reasonably but not perfectly identified in economics, and that public health work that thinks you can solve endogeneity with a logit gets published on PLOS.One or at the Lancet within minutes, so you can have years before error gets corrected because econ makes the perfect the enemy of the good.

New Zealand's Chief Censor is building his case for regulating porn on the internet. I hope he realises that he can only do more harm than good in this area before he does harm.

Every get frustrated that Statistics New Zealand only ever releases cross-tabs that somebody working there thought would be important, and that getting other cross-tabs is a huge hassle? Well, IPUMS has solved that over in American data. Here's the mobile-friendly version of their site letting you build whatever cross-tabs you like out of ACS data. And remember that IPUMS hosts data for other countries too, so we could totally have access to this kind of facility instead of waiting for the never-will-come made-in-NZ variant. I'd yell again about the lack of open data in NZ, but Stats seems to be falling apart in even trying to get the last Census out, so they've bigger things to worry about right now.

Martin van Beynen reports on costs of the government's student loan scheme. Comments from me included, but Martin gives me an undeserved promotion.

Thursday, 8 June 2017

Open data - Free the CURFs

Koordinates hosted a fun event yesterday on open data.

Ed Corkery, Koordinates CEO, opened with a talk on the potential of open data in GSS-space, once it's actually opened up in useable form. Too much data winds up being inaccessible or unusable; Koordinates works to try to make that data more easily used.

Statistics NZ's Government Statistician, Liz MacPherson, presented next. She has a great vision for where open data should be at. We're not there yet, but I like the kind of future she's describing. In that world, IDI users share all of their code. That means that you don't just get replicability and a lot more potential for error-catching, you also get standardised bits of code that can get dropped into projects. So if one person's already run the code that matches, for example, students' NCEA records to later income tax data, somebody else can just grab that bit of code rather than have to re-create it. There's recognition of that Stats needs to be careful not to break its current social licence as a trusted repository of data, but that there are ways of doing that while also being far more open than Stats has been.

It's work in progress, with all kinds of real technical hurdles. Antiquated back-end systems generate reports that turn into the current tables, with dependencies all over the place, making it tough to shift towards the more flexible and dynamic environment that would allow cross-tabs to be generated on the fly. Get far enough into that world, and you don't even need Confidentialised Unit Record Files any longer. Instead you can get privacy and confidentialisation on the fly that scales confidentialisation to the risk of deanonymisation given the kind of data being extracted.

But things are moving.

One easy thing that Statistics NZ could do, as an interim measure while everything else is going on, would be to open up the CURFs. And that's a good chunk of what my talk focused on. I just can't see any good reason that these things are still locked up behind difficult access barriers when America's PUMS are all available to anyone in the world who has a browser.

Liz described me in her talk as one of Stats' NZ's most vigorous critics. But I love Stats NZ. I just get frustrated that the CURFs have been locked up forever, and that what we're able to do here is so far behind what can be done with American data because of the access controls. And that's especially frustrating when so much NZ data held in IDI is so much better than that which can be done with American data. It'll take time to sort things out on the back-end so that we can get front-end interfaces that match what IPUMS is already doing in American data (and that Berkeley's SDA engine has been doing for ages now), and some of that is unavoidable where there are resource constraints and a pile of old systems that need sorting out.

But why not open up the CURFs in the interim? It would also help signal the change in approach at Stats, in line with the Government Statistician's vision of real open data.

Flip the switch! Free the CURFs! And, just to be on the safe side, put in a big real penalty for anybody who takes the CURFs and uses them to re-identify individuals.

Update: Oh My.

Achieving the vision will take us all. On the CURFs - watch this space!! #opendata https://t.co/FcYp9P7Yt3
— Liz MacPherson (@GovStatistician) June 8, 2017