Tuesday 2 September 2014

Are ODI Scores Increasing? UPDATED

I had a conversation with a sports blogger, John Rogers, on Twitter last week. John Rogers had tweeted a link to a blog post he had written on why the WASP projection being used in BSkyB's coverage of limited overs cricket this English summer is necessarily inaccurate. His point is that ODI cricket is evolving quickly, both in the equipment and the style of batting, so that historical data is a poor guide to how many runs you can expect a team to score.

There is always a tradeoff in statistical work between using only the most recent data to capture trends, and using a longer time period to get more statistical significance. Now, in principle, since WASP is calibrated to a par score set by the broadcast commentators, any trend in scoring that has occurred within the period of the data used to estimate the model could be adjusted for in the par score. The setting of a par score is both a strength and weakness of WASP. The strength is that it allows game-specific information to be factored into the projections such as using local knowledge to assess how the pitch is likely to play. The weakness, however, is that the commentators might suffer from the common human biases of seeing patterns in essentially random data, and I wonder if the view that batting power is increasing is an example of that.

So I was interested to see if John's perception of a recent increase in scoring rates due to teams having more "lower-order hitters", better bats, etc. is borne out in the data. There is no doubt that there has been an increase in scoring over time. For example, all of the 16 ODI matches (all involving top-8 countries) where the team batting second has scored 330 or more have occurred this century. Only 5 of those 16, however, occurred this decade, suggesting that maybe the changes are not so recent.

Extreme scores like these are not necessarily indicative of a general trend, so some regression analysis is called for. John's hypothesis seems to be mainly based on increased rates of scoring by lower-order power hitters near the end of the innings. I don't have the full ball-by-ball database to hand, just a record of scores and results, but if the theory is correct, it should show up in total scores. Now WASP is currently based on ODI data from 2006 involving the top-8 teams, so I had a look at all non-rain-shortened games involving those teams from May 1 2006, using a dummy variable for each year starting May 1. First, I looked at the evolution of first innings scores over that time. To control for different abilities across countries, I ran an OLS regression of first-innings score on dummy variables for the team batting first and for the team bowling first, as well as a dummy variable for each of the 8 years in the database. To further control for differences across grounds, I restricted the data set to games played at grounds where there were at least 10 matches played in this period, and included a dummy variable for each ground. This left me with 245 games. The results are shown in by the blue line in the graph below, with the line showing (left axis) the average first innings score for the average team against the average team at the average ground. There clearly has been very little change over these 8 years.

John's blog post, however, seemed to refer specifically to the ability of teams to chase down large scores, so I separately looked at whether there has been a change in the the probability of the team batting second winning using a probit regression. Because differences in grounds largely affect ease of scoring in both innings, and because probabilistic models require more data to get precise estimates, I used the full dataset without dummy variables for the ground, but again controlled for team ability and included dummy variables for each year. The results are shown in red on the same graph (right axis). Probabilistic models typically require a lot more data, and so I wouldn't put too much faith in the estimates for any one year. But there doesn't seem to be a clear recent trend to it being easier to chase down scores than in previous years, although there was a strange dip in the period 2007-2009 that has since been reversed.

I suspect what is happening is a perception bias. There probably has been a recent increase in power hitting as a result of batsmen taking more risks, but that has been balanced by an increase in the rate of dismissals. And this leads to the reality being different from common perceptions. 20-20 has conditioned us to thinking that it is easy to score 8-9 runs an over on small grounds with flattish pitches. And it is. But it requires aerial shots, unlike 5-6 an over, which can be achieved entirely along the ground with 1s and 2s and the occasional bad ball cut or driven for 4 along the ground too fast for the cover sweeper to collect. With modern bats and batting, it is not difficult to sustain 8-9 an over through regular sixes and lofted 4s, but it is hard to do so without losing regular wickets. But wickets arrive randomly. Now think of the commentators bias. If a batsman hits a clean six, he is lauded for his good shot. If he mistimes it and is caught, as often as not he will be criticised for "taking unnecessary risk" or "not waiting for the right ball". (Have you ever heard a commentator criticise a batsman for taking unnecessary risk after making a clean hit for 6?) This creates an impression that the good shots are normal, and the wickets are just an avoidable failure rather than both being natural consequences of a particular level of aggression. Combine that with our recollections of past matches. Sometimes a team chasing 120 off 72 balls will have a randomly good passage scoring at that rate without any lofted shots going to hand, and it will make it look like such fast scoring is easy. At other times, we will see a procession of wickets and we will be thinking how the batsmen threw the game away. It is the first case that sticks in our mind when we make our own assessment of probabilities, and so we inflate in our own minds what the probabilities of winning are when a team is chasing a large total.

Data (even historical data that may become out of date) is a good antitdote to these perception biases.

UPDATE: Chris Smith, of the wonderful cricket blog, Declaration Game, asks by tweet if the results would have been very different had I not restricted to top-8 countries, and controlled for team ability and ground. That is, would we observe a general increase in scoring, but one attributable to having more games with weak teams, and more smaller grounds. Rerunning the numbers on the first-innings scores, if we include Bangladesh, Zimbabwe, Ireland and Afghanistan in the data (I don't have other countries in my database), and don't control for team ability, and don't control for grounds, we see a 10-run increase between the periods 2002-2007 to 2008-2013. Removing the four weaker teams and controlling for team ability only reduces that change by 2 runs. The big change comes when we restrict the data to grounds with at least 10 games in the dataset  (still 540 games) and control for the ground. This reduces the change down to 2 runs.


  1. You seem to be underestimating the effect of shifting even a relatively small number of people to rail from driving instead. Traffic congestion/delay is exponential; as it gets near capacity the costs sky-rocket. The initial rail plan is focused on the Northern M'way Corridor, which was suffering from at-capacity flows. With peak congestion it only takes a
    little change in numbers to have a big effect on delays. 40,000 vehs
    cross the Waimak Bridge each day; over 6000 would be during the peak
    periods. Back of the envelope calc: If shifting ~400 people to train
    reduces everyone else's peak trip by even 10 minutes (quite likely), then using NZTA travel time rates
    that's about $15,000 saved each day in travel time, or over $3
    million/yr. A useful payback, even without the more direct benefits...

  2. Except it wont, because the trains would go to Addington, not the CBD. You might find 40 at best going to Addington. To build a CBD station requires restoring the east to north link at Addington, and finding a site for a station at, Moorhouse Avenue, which also isn't exactly convenient, but better than Addington - which explains pretty much why Christchurch lost commuter rail first. The fetishisation of railways over relatively modest improvements that could be made to allow buses to bypass queues is bizarre. You need bespoke infrastructure, bespoke vehicles (and much more signalling) that is largely useless for anything else if it fails. Widen some hard shoulders and a few intersections, and you can use existing buses on an express service from where people live to where they want to go. If it fails, then little is lost and the buses can be used elsewhere. In Wellington, private commercial bus services thrive alongside commuter rail because of this - the Otaki-Wellington express bus lasted for many years, Wainuiomata-Wellington, and the Hutt Valley Flyer bus service, indicating that what matters is cnnvenience and speed.

  3. What is most frustrating is that Labour (or any party) could propose a solution that works for the long haul and is "free" (er... politically anyway).

    My wife makes the commuter run from beyond Rangiora into central Christchurch regularly and has done for many years. She tells me her trip has improved over the last few months. So what's going on?

    First of all there is little congestion on the motorway itself the problems start as traffic exits the motorway onto the three main routes in and around the city. They are hopelessly congested and it's the tailback that makes the motorway appear clogged.

    As it turns out there are plans to declog two of those routes that have been in place for ten years. I don't know when CCC's Northern Corridor is scheduled for implementation but the government announced this year it would not start work on the Belfast Bypass (SH1) until 2018.

    What should Labour have announced? Throwing money at those two projects today would do it. What's more the money is already there and doesn't have to be found. This year's Vote Transport has a significant lump of new money built in for new State Highways and new local roads. I don't know what this government had in mind for this money but if I had been in Labour's shoes I would have recognised that, (i) as the significant planning work has all been done and (ii) the money is already there, and (iii) declogging those two routes will do more for accessing CHC from North Canterbury than anything else, that it would be a no- brainer to advance that work.

    In the meantime my wife has found that as earthquake damaged streets in Christchurch get fully restored and useable, and as CCC make some improvements to Marshlands Rd in anticipation of the Northern Corridor project there are already discernible improvements to travel time.

    Again it's a no-brainer.