David Levy and Susan Feigenbaum worried a lot about this in "The technological obsolescence of Scientific Fraud". Where investigators have preferences over outcomes, it's possible to achieve those outcomes through appropriate use of identifying restrictions or method - especially since there are lots of line calls in which techniques to use in different cases. They note that outright fraud makes results non-replicable while biased research winds up instead being fragile - the relationships break down when people change the set of covariates, or the time period, or the technique.
Note that none of this has to come through financial corruption either: simple publish-or-perish incentives are enough where journals are more interested in findings of significant than of insignificant results; DeLong and Lang jumped up and down about this twenty years ago. Ed Leamer made similar points even earlier (recent podcast). And then there's all the work by McCloskey.
Thomas Lumley today points to a nice piece in Psychological Science demonstrating the point.
The lesson isn't radical skepticism of all statistical results, but rather a caution against overreliance on any one finding and an argument for discounting findings coming from folks whose work proves to be fragile.
In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings ( ≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.Degrees of freedom available to the researcher make it "unacceptably easy to publish "statistically significant" evidence consistent with any hypothesis." They demonstrate it by proving statistically that hearing "When I'm Sixty-Four" rather than a control song made people a year-and-a-half younger.
The lesson isn't radical skepticism of all statistical results, but rather a caution against overreliance on any one finding and an argument for discounting findings coming from folks whose work proves to be fragile.
"The lesson isn't radical skepticism of all statistical results, but rather a caution against overreliance on any one finding and an argument for discounting findings coming from folks whose work proves to be fragile."
ReplyDeleteIndeed. However, I think the realisation of just how important unobservables are in most circumstances - and how it is often difficult to deal with them - has made me discount empirical results more over time.
"none of this has to come through financial corruption" - of course. But it's not just publish-or-perish, but also personal beliefs. The literature on aid effectiveness is a great case. Doucouliagos and Paldam, in their meta-analysis of the 140 published studies in the field, for example find that people working with or financed by donor organisations are way more likely to 'find' that foreign aid works. Virtually no studies by really independent scholars find any positive effects of aid.
ReplyDeleteDoucouliagos is great for poking holes in empirical literatures, isn't he?
Delete