Monday, 16 November 2015

Free the data

Why can't New Zealand public data be as publicly available as American data? Here's a snip from my column from
Academics in New Zealand wanting to use individual-level publicly collected data to look at questions of public policy interest have two basic options. They can request a Confidentialised Unit Record File from Statistics New Zealand for the Census or other collected data series, or they can request access to the data lab to get closer to the raw data.
Where America makes its anonymised samples free for anyone to download, New Zealand restricts things. The application form requires that you only use the data for the exact purpose you specify and limits how you use it.
Sometimes it is worth the costs of making that application, but a lot of the time a researcher might want to run a very simple correlation test or check whether average outcomes differ between two groups, just to see if there is any reason to go any further.
And, a lot of the benefit of easily accessible public data is in simple myth-dispelling. A lot of newspaper columns make a lot of assertions that are testable – at least in principle. Debunking them is not worth the hassle when it requires a specific application and approval process – even though Statistics New Zealand is exceptionally helpful through those processes. 
If New Zealand followed the United States in providing simple and accessible confidentialised public-use microsamples of its large datasets, researchers could undertake the kind of exploratory data analysis that leads to bigger projects down the track. Sometimes, we need to let the data tell us which hypotheses to test – and we need access to the data to be able to do it.
But, a New Zealand researcher applying for access to a confidentialised unit record file has to promise never to use it for anything other than the purposes stated on the application form, to keep the data secure (using it on laptop computers is effectively banned), and to destroy the data at the end of the research project or at the end of the 12-month licence.
Where rich American data is available to anybody with a web browser and an internet connection, and similar New Zealand data is rather difficult to get and to use, is it any particular surprise that a lot of New Zealand academics, paid in part by the New Zealand government, use American data rather than the New Zealand data that New Zealand taxpayers have already paid to collect?
And that is the self-inflicted wound.
On one side we have rich data that could throw light on countless interesting public policy questions, excellently and expertly collected and curated by Statistics New Zealand.
On another we have an army of academics whose continued employment increasingly depends on landing refereed journal articles. Those academics chose to live in New Zealand rather than elsewhere, and many of them would love to use New Zealand data to shed a bit of light onto policy discussions that are too-often devoid of evidence.
In between the data and the researchers we have the Statistics Act 1975 that binds Statistics New Zealand. To make things even worse, Statistics New Zealand will not release an anonymised data set to any researcher not based in New Zealand. Researchers from around the world help Americans to understand how their country works because they have free access to American data. New Zealand hoards its data like Tolkien’s dragon guards its gold.
Just look at the ACS data and how easy it is to access. Go to IPUMS, set up a free account, click the box that agrees that you will use the data for good and not for evil*, and then you can use the same SDA interface for the ACS data that you're of course well familiar with from the US GSS data. You can run regressions on the 2013 ACS sample, or cross-tabs, within the web browser. To do the same thing in New Zealand, you would have to specify exactly what you wanted to do and get prior permission for each thing.

* There seriously is such a box.

1 comment:

  1. Interesting that Land Information makes GIS data so easily accessible. In fact, I recall seeing a discussion among some Australians where someone who worked in the field said something along the lines of "no government would ever make that kind of information freely available" (I forget the stated rationale, something to do with the data being too important or some such). I helpfully pointed out that NZ does exactly that, and was met with a response of, "I guess they do things differently over there."

    Land Information's policy documents concerning opening up data and the associated benefits may also be useful if trying to convince NZ politicians to approve a statutory amendment.