Saturday, 25 February 2012

Confounds, alcohol and violence

It's plausible that increased alcohol consumption in a neighbourhood directly affects assault rates. But I don't think you can tell it from this study (HT: Bakadesuyo).

Methods and Findings

We performed a population-based case-crossover analysis of all persons aged 13 years and older hospitalized for assault in Ontario from 1 April 2002 to 1 December 2004. On the day prior to each assault case's hospitalization, the volume of alcohol sold at the store in closest proximity to the victim's home was compared to the volume of alcohol sold at the same store 7 d earlier. Conditional logistic regression analysis was used to determine the associated relative risk (RR) of assault per 1,000 l higher daily sales of alcohol. Of the 3,212 persons admitted to hospital for assault, nearly 25% were between the ages of 13 and 20 y, and 83% were male. A total of 1,150 assaults (36%) involved the use of a sharp or blunt weapon, and 1,532 (48%) arose during an unarmed brawl or fight. For every 1,000 l more of alcohol sold per store per day, the relative risk of being hospitalized for assault was 1.13 (95% confidence interval [CI] 1.02–1.26). The risk was accentuated for males (1.18, 95% CI 1.05–1.33), youth aged 13 to 20 y (1.21, 95% CI 0.99–1.46), and those in urban areas (1.19, 95% CI 1.06–1.35).
What's the problem? They don't seem to be controlling for day-level fixed effects or, even better, day-city fixed effects. Suppose there's a big hockey game on Saturday night that both brings a pile of folks onto the street and increases alcohol purchases. You can get a correlation between increased alcohol sales (relative to the week prior) and assaults entirely as artefact of the underlying variable driving both assaults and alcohol sales. A big hockey game, a holiday long weekend, even a big concert in town - none of those are addressed by comparing alcohol sales with those a week prior.

How do you fix this? Controlling for simultaneous alcohol sales in a similar part of town that's far enough away that it's unlikely to have had effects on the part of town in question would be a start, but might not catch localized effects of events that drive both alcohol sales and violence.


  1. I think you are being overly nit-picky here. Two points:

    1. Day-city fixed effects will eat all their 3,200 degrees of freedom and then some. So you can't do it that way. Even day fixed-effects will eat about 900, which would be a problem, too.

    2. What is your causal model here? That "big hockey game" and "boozing" are *independent* causes of violence? I don't think so. I have a different theory: big hockey game causes boozing causes violence. Same for long weekends or concerts. The boozing remains the proximate cause. Maybe there is an interactive effect, which you can test for directly without resorting to hundreds or thousands of fixed effects. Your other proposed solution wouldn't achieve this, though. A long weekend in one part of town is also a long weekend across the other side of town.

    To me, this looks like a pretty solid study. Sure, the r-squared isn't 1, and there is some localized unexplained variation. But if we threw out all studies whose r-squared did not approach 1, where would that put us?

    1. I'm not looking for a R2 of 1, Rob. I'm hoping instead that we can rule out that there's anything going on that simultaneously determines drinking and violence. Your causal mechanism's pretty plausible too. Or, we can have both going on right? X -> violence and X -> alcohol -> violence. Example - weather. Hotter days have more violent crime, and also have folks drinking more beer. Easy to imagine both channels working. But we we leave the X out, we overestimate alcohol's effect.

    2. As I mentioned, I think the best way to get at these additional / alternative mechanisms is to form specific interactive hypotheses and test them directly. Temperature data is easy, so are long weekends. You can start to get at "big hockey game" with daily NHL TV ratings for Ontario. Etc. The point is you form a specific hypothesis tested directly, using 2 DF each (1 for constitutive effect, 1 for interactive effect), rather than throwing hundreds of atheoretical FE control variables at the regression which increase the risk of an erroneous null result and also tell you nothing about your additional hypotheses.

      My bet is that the alcohol result will stand up to these interactive effects, especially the young males result. It looks to me like a strong result. I do not see their failure to do this stuff as invalidating their study. Not every study can test every hypothesis, and I feel I know more having seen this study than I did before.

    3. Those work to the extent that you know what the shocks are, right?

      For my money, I rather prefer the identification strategy in Carpenter (2007).
      They use drinking age changes causing changes in alcohol consumption for specific age groups but not others; differences in crime rates among 18-21 vs 21+ identify the alcohol effect.

  2. just trying to work out what to do here Eric, is it drink with the neighbour, study statistical irrelevance, refuse strategical questions by analytical academics, or just get out there and kick arses