Showing posts with label Dunedin Longitudinal Survey. Show all posts
Showing posts with label Dunedin Longitudinal Survey. Show all posts

Tuesday, 20 December 2016

The targeted cohort

If you're surprised by the latest results from the Dunedin cohort study, you haven't been paying attention:
We integrated multiple nationwide administrative databases and electronic medical records with the four-decade-long Dunedin birth cohort study to test child-to-adult prediction in a different way, using a population-segmentation approach. A segment comprising 22% of the cohort accounted for 36% of the cohort’s injury insurance claims; 40% of excess obese kilograms; 54% of cigarettes smoked; 57% of hospital nights; 66% of welfare benefits; 77% of fatherless child-rearing; 78% of prescription fills; and 81% of criminal convictions. Childhood risks, including poor brain health at three years of age, predicted this segment with large effect sizes. 
A relatively small group generates the preponderance of social cost. And it's G-loaded. A rough measure of child intelligence at age 3 predicted a lot of bad outcomes.

Some of those relationships eased back in multivariate analysis with childhood SES included. But that's a tricky thing. If income is increasing in IQ (albeit concavely), then childhood SES depends on parents' IQ, but parents' IQ is a predictor of the child's adult IQ independently of of childhood SES. Some of the effect of childhood measures of brain health on adult outcomes is then unduly attenuated by inclusion of childhood SES in the regressions as some of the IQ effect could be picked up as a measured SES effect. On the other side, a higher IQ kid born into a lower SES household with lower IQ parents would select into worse environments for cognitive development over time, following the Dickens-Flynn kind of model. You need twin studies or adoption studies to start teasing that out properly.

While a fifth of the Dunedin cohort was responsible for massive amounts of the cohort's crime, prescriptions, hospital stays, fatherless children and social welfare costs, another cohort had almost nil costs.


The paper is optimistic about the potential for interventions on the identified group to reduce long-term costs and improve outcomes. I agree that identifying the cohort for targeting is important, but I'm a bit more pessimistic about the chances of success.

They note the data is right-hand censored at age 38 years. I wonder how many children had accrued to people in each of the above-pictured cohorts by that age.

Tuesday, 12 March 2013

Trusting Secret Data: Dunedin edition

Unless you've run the regressions yourself, it's often hard to trust empirical results. A lot of results are fragile - small changes in specifications, either changing date ranges or adding seemingly irrelevant variables, can change results. And endogeneity always makes inference hard in social sciences.

First best is running things yourself. If the data and code are available for replication and extension, then it's impossible to maintain outright fraud. And fragile results will eventually be commonly known to be fragile. Boing Boing recently ran a very nice summary in two parts on why we see so many contradictory results on guns and crime: all the specifications are fragile. On some questions, teh science isn't strong enough to give much policy direction.

When the data is not publicly available, we really have to trust the people running the specifications. There can be good reasons for keeping data private. Most self-interestedly, if the researchers did a ton of work in putting together the dataset, they might well want to get a few papers based on it before letting everyone else pile on. It's not an edifying reason, but even if the data collection involves public funding, it doesn't seem unreasonable to give the survey team the first few kicks at the can. On the other hand, if the data cannot be shared without violating the privacy of individuals who answered the surveys, and if there aren't clever ways of anonymising the data so that it cannot be used to identify individuals, then that's a pretty good reason to be stingy with the data. 

But, there potentially being good reasons for secret data certainly isn't sufficient reason to trust the consequent analysis. And so we come to Dunedin vs Rogeberg.

The Dunedin Longitudinal Survey is secret data, likely for the good reason of wishing to maintain the anonymity of survey respondents. We really have to trust the people running regressions on secret data. The Dunedin group recently found that early marijuana use predicts IQ decline. Ole Rogeberg wondered whether the results were due to cohort selection effects: the kids most likely to try marijuana early could well have had different results even had marijuana not existed. Rogeberg provided Dunedin a list of tests that might sort things out. He summarises things here:
I’ll start with a short recap: Researchers published article august 2012 arguing that adolescent-onset cannabis smoking harms adolescent brains and causes IQ to decline. I responded with an article available here arguing that their methods were insufficient to establish a causal link, and that non-cognitive traits (ambition, self-control, personality, interests etc) would influence risks of adolescent-onset cannabis use while also potentially altering IQs by influencing your education, occupation, choice of peers etc. For various reasons, I argued that this could show up in their data as IQ-trends that differed by socioeconomic status (SES), and suggested a number of analyses that would help clarify whether their effect was biased due to confounding and selection-effects. In a reply this week (gated, I think), the researchers show that there is no systematic IQ-trend difference across three SES groups they’ve constructed. However, as I note in my reply (available here), they still fail to tell us how different the groups of cannabis users (never users, adolescent-onset users with long history of dependence etc) were on other dimensions, and they still fail to control for non-cognitive factors and early childhood experiences in any of the ways I proposed. In fact, none of the data or analyses that my article asked for have been provided, and the researchers conclude with a puzzling claim that randomized clinical trials only show “potential” effects while observational studies are needed to show “whether cannabis actually is impairing cognition in the real world and how much.”
I'll be interested in seeing the Dunedin group's reply when it comes out. Rogeberg points to evidence the Dunedin group themselves have provided showing reasonable cohort differences between users and non-users. It's not implausible that these kinds of differences are responsible for at least some of their measured marijuana effect. It would be simplest if Dunedin would send Rogeberg the data and make him sign a data confidentiality waiver; it's not particularly plausible that a Norwegian labour econometrician wants the data for identifying NZ individuals. But if they're not willing to do that, they should at least be willing to put his code to their data. 

I'll be downgrading my trust in all the Dunedin Longitudinal results if they don't handle this well; secret data requires trust.

There is away around all of these kinds of problems though.

The General Social Survey in the US would have almost as many privacy concerns as the Dunedin survey. And yet they're able to make available an online resource letting anybody with a web browser run analysis on the data, for free. I regularly set assignments in my undergrad public choice class where students without much in way of metrics background have to go and muck about in the data.

As I understand things, the Health Research Council funds the Dunedin surveys. There's a worldwide movement toward open data in government-funded projects. HRC could fund a Dunedin Longitudinal equivalent of the GSS browser analytics. Anything personally identifying, like census meshblock, could be culled. Nobody would see the individual observations. And any cross-tab that would reduce to too-small a number of observations could return nulls. But folks with concerns about Dunedin studies could do first-cut checks without having access to the bits that might cause legitimate worries about privacy. Somebody like Rogeberg should be able to run a t-test on whether those who later go on to report marijuana use differed on other important variables prior to their starting consumption.

We all could trust Dunedin results more if we could check things like this. And it shouldn't be hard to set this up either: see how SDA coded things for GSS, put in the Dunedin data instead, then host it. If HRC did this, one small bit of funding could have a whole ton of researchers publishing useful studies based on the already-funded survey instead of having to fund Dunedin for every new regression. Let the Dunedin group keep any new wave of the survey under wraps for a couple of years before releasing it to the online analytics so they get a fair kick at the can. After that, why not open things up?

Tuesday, 19 February 2013

Underlying type

Does alcohol or drug use lead to a whole pile of other risk-taking activities, does some common underlying risk preference determine both substance abuse and other risky behaviours?

The Dunedin Longitudinal Survey group finds evidence that women with more sexual partners are more likely to later report substance dependence disorders than those with fewer partners. Women having had more than 2.5 sexual partners per year between the ages of 18-20 are 9.6 times more likely to report substance dependence disorders at age 21, adjusted for prior disorder incidence. Women aged 26-31 having had more than 2.5 partners per year are 17.5 times as likely to report substance dependence disorder at age 32. Similar patterns held among males, although the risk ratios were much smaller. 
The explanation for the relationship is likely to be complex. Four possibilities are proposed. First, sexual risk taking and substance use may be part of the cluster of risk taking behaviors common in adolescence and young adulthood (Arnett, 1992; Boyer et al., 2000; Caspi et al., 1997; Desiderato & Crawford, 1995; Donovan & Jessor, 1985; Taylor, Fulop, & Green, 1999). For instance, people who are impulsive may be more likely to engage in both activities and, consequently, more likely to become substance dependent. Second, occasions of substance use are opportunities for sexual behavior because of its disinhibitory effects and lack of accurate perception of risk (Crowe & George, 1989; Fromme, D’Amico, & Katz, 1999). Weinhardt and Carey (2000) have suggested, in a review of event-level research on this topic, that the association, especially with condom use, is also complex. Thirdly, shared context may be an important factor, insomuch as young people are likely to meet new sexual partners in situations where alcohol is served. These settings might encourage sexual behavior and facilitate multiple partnering.
The fourth intriguing possibility is that it is something about having multiple sex partners itself which puts people at risk of substance disorder. For instance, it may be due to the impersonal nature of such relationships. Or, it might be that multiple failed relationships create anxiety about initiating new relationships. Self "medication" with substances may be one way of dealing with this interpersonal anxiety (Khantzian, 1997; Stoner, George, Peters, & Norris, 2006).
They also note the studies showing that alcohol use correlates with more risky sexual practices; I didn't see reference to the one suggesting drinking was associated with more positive consequences of sex.

I'd also worry about cohort attrition effects: if men and women who are more psychologically stable are more likely to get married before the age of 30, then the pool of women reporting >2.5 sexual partners per year* between the ages of 26-31 is probably different from the pool of women reporting the same numbers in their early 20s [recall that the Dunedin study follows a cohort born in 1972-1973]. 9.5% of women reported 2.5+ partners per year at 18-20; that dropped to 4.5% by age 21-25 and to 1.7% - 8 women - by age 26-31.

I wish that the Dunedin study had some calibrated measure of risk tolerance, like the Holt and Loury measure, as well as a measure of individual discount rates. I would love to see pinned down what portion of risk-taking behaviour comes down to heterogeneity in individual risk tolerance, what portion comes down to that things with longer term costs might be disproportionately preferred by those who avoid the tyranny of the later-self, and what portion might be due to amplification effects where doing one risky thing actually does make you more likely to do another risky thing.

But the takeaway here is that studies suggesting drinking is associated with riskier sexual activity might well worry about reverse causation or common underlying causes.

* No, you can't have half a sexual partner. They're asked number of partners and that's averaged across the age range for that respondent.