Offsetting Behaviour: WASP

Showing posts with label WASP. Show all posts

Wednesday, 4 March 2015

Batting out your overs

The mantra that "the biggest sin a team batting first in an ODI can commit is to not bat our its overs" has long been a bugbear of mine. As Dan Liebke noted in a rant about net-run-rate the other day,

We've had Duckworth Lewis for decades now and, even if the mathematics of it is beyond most casual fans, the basic concept that wickets remaining are a resource that need to be considered along with overs remaining is pretty well established.

Yes, a team has two resources. If it is a sin to not use one of those two resources to the max, why is not also a sin to bat out 50 overs leaving capable batsmen in the pavilion with their pads on? A batting team has to manage both declining resources with no certainty as to the effect that its actions will have on either the rate of scoring or the loss of wickets.

So I was very happy to see Chris Smith take on this mantra in his Declaration Game blog, and also to see him quote a former player, Geoff Lawson, who was prepared to take a contrarian view.

`Why?' asked Geoff Lawson, who went on to rationalise that if all the batting side attempted was to survie the 50 overs, they were very unlikely to set a winning total. `Wouldn't it be better', Lawson argued, `to hit out wiht the aim setting a challenging targe, accepting the risk that they could be bowled out, than to crawl to an unsatisfactory total?'

Lawson is right, although maybe not quite. In this quote, he seems to be suggesting that a team that is heading towards a very low score might as well start taking more risks to get to a competitive total. This is a manifestation of a mathematical theorem known as Jensen's inequality, when optimising over a relationship that is not linear, but actually, the relationship between the total score and the probability of winning is pretty much linear over the range of possibilities that can occur on any particular ball. That means, that a batting team should always ignore the current score, accept bygones as bygones, and base their level of aggression on how many balls and wickets they have remaining.

As it happens, we can quantify this decision reasonably precisely. The graph below gives a measure of what I like to call "deathness" for the first innings. The particular metric I use is the payoff to a risky single. Imagine that the batsmen have to choose between trying for a run or not. If they choose not to run, they will score 0 runs but not lose a wicket. If they try for the run, there is some probability that attempt will fail and one batsman will be run out, or they might succeed. What probability of being run out would be too high to make the risk not worth the cost. The graph shows that cross-over probability as a function of the number of overs bowled, for each possible number of wickets lost. The higher is the probability, the greater is the risk that it is worth taking and so the greater is the level of deathness (so called, because the final overs in an innings where batsmen start to take higher levels of risk is often termed "the death"). The actual numbers aren't particularly interesting (most decisions on aggression are about striking the ball, not about whether to attempt a run), but the comparison across different lines in the graph is revealing. So, for example, the graph reveals that if a particular level of aggression is warranted after 40 overs when a team is 5 wickets down, then the same level can be justified at 23 overs if no wickets have been lost.

Before getting to batting out your overs, a few things to note about this graph:

It is based on WASP data that predates the rule change to two new balls and only four outside the circle. That said, the basic story would not change using more recent data or some other estimate of the cost of a wicket such as the Duckworth-Lewis tables.
This table indicates what the expected payoffs are to different levels of risk and return in different game situations; it does not show what different risk-return combinations are possible. So, for 0-7 wickets down, the graphs indicate that the cost of risk is high at the start of the innings (the probability of a run-out has to be very low to justify attempting a run). With the fielding restrictions in the first 10 overs, however, it can be that the return to batsmen from a particular level of risk is much higher than in the middle overs, so that a high-risk strategy is still worthwhile, despite the costs.
The graphs all hit 100% for the final ball of the innings. That makes sense. It is simply saying that as long as there is any probability whatsoever of not being run out, you might as well keep running until you lose your wicket on the final ball.
Interestingly, though, for 1, 3 and 6 wickets lost, the graphs hit 100% before getting to the final ball of the innings. Remember that this is based on average-team versus average-team data. What is going on here is that on average the batters deeper in the batting order, are better at power slugging than those further up the order. So, for example, it is common for a batting order to have two aggressive openers followed by an accumulating #3 to take the team through the middle overs. If a team gets to 43 overs with only one wicket down, it might be better to go for a suicidal run (with the #3 coming to the danger end) and bring in a power hitter than to play out a dot ball.
The graph for 9 wickets down slopes down for most of the graph. This is mostly reflects out-of-sample extrapolation (there is no actual data for games where a team is 9 wickets down after 2 overs), and also the fact that when a team is 9 wickets down very early, there is almost no chance they will bat out their overs, and are likely to lose their last wicket any time so its worthwhile the batters taking risky singles while they are still there to do so. The longer the innings progresses, the less reason there is to think that the next wicket is imminent and so more need for caution.
While there is a general tendency for the graph to be lower the more wickets that have been lost, this tendency is not absolute. This is because, while losing a wicket will reduce the expected number of runs the team will score, the cost of the next wicket is not necessarily greater. For example, after about 46 overs, the incremental cost to a team of losing its 7th wicket is less than losing its 5th or 6th at that stage, so a team being 6 wickets down should be more aggressive than one that has lost only 4 or 5 wickets.

So let's now think about batting out your overs. In the World Cup game between New Zealand and England, England batting first lost their 6th wicket at 28.1 overs, their 7th later in the same over, and their 8th at 30.4 overs. Looking at the purple, yellow and pink lines, the deathness measures at 28-30 overs, are all pretty much the same. Yes, a lot more caution was called for than if they had only been 2 wickets down at that point (and so Broad's approach at that point was probably not beyond reproach), but also the optimal strategy was not for the team to go into its shell. Rather the situation called for playing in much the same style as any team should do in the middle overs (10-30) but delaying all-out aggression for a bit longer than if they had more wickets in hand. This pretty much describes any situation where the "make sure you bat out your overs" comment is likely to arise. A team should probably delay its all-out assault for a bit if it loses too many wickets, but at no point should it bat more conservatively than in a normal middle-orders situation.

Thursday, 19 February 2015

The trouble with Net Run Rate

In any competition in which there is pool or round-robin play to rank teams before playoff rounds, there needs to be some method of deciding the relative ranking of teams who finish equal on wins and losses. Ideally, this method will reward the teams that have performed best, and also not create any perverse incentives for teams to do anything other than act in a way to maximise their probability of winning.

A nice example of perverse incentives came in the 1999 Cricket World Cup. Only two teams out of New Zealand, Australia, and West Indies were going to carry on from their group into the next round. The rules were such that teams carried through only their results against other teams that made it to the next round. Prior to the match between the West Indies and Australia, New Zealand had beaten Australia but had lost to the West Indies. Australia therefore needed to beat the West Indies, but also wanted WI to be the team that carried through with them so that their loss against NZ didn't matter. As is traditional in the Cricket World Cup,the method used to rank teams with equal numbers of wins and losses, was net-run-rate (NRR)--the difference in a team's average runs scored per over faced and its average runs conceded per over bowled. Batting second, Australia therefore did a deliberate go-slow in order to win, with their 5th wicket partnership taking an extraordinary 127 balls to score the 49 remaining runs needed for a win. This was designed to elevate the West Indies' NRR above New Zealand's. As it turned out, the strategy was not successful, as New Zealand still had a match against the lowly ranked Scotland, and took extraordinary risks to not only win that match but win it by a sufficient margin for their NRR to overtake the West Indies'.

In the current World Cup, there isn't the same "super 6" 2nd stage where teams only carry through some of their points from the first round, but NRR is still used as the tie-breaker. This system is still flawed, as exemplified by Tuesday's match between New Zealand and Scotland. Anyone looking at the two innings scored could be mistaken for thinking that the match was close. It wasn't. What happened was that New Zealand bowled Scotland out for a very low total, and was almost guaranteed a win. When it was New Zealand's turn to bat, they strove to win the match in a few overs as possible, in order to maximise their runs-per-over figure. The fact that they lost 7 wickets in the attempt meant that they did present Scotland with the sniff of a chance of an upset, but the 7 wickets will have no bearing on their eventual NRR.

This exemplifies three problems with NRR:

The effect of a large win against a lower-ranked team on NRR depends on which team bats first, since the team batting second only bats until it has overtaken the other team's score, meaning that that innings gets a lesser weight in the runs-per-over calculation than an innings where all 50 overs are faced.
The magnitude of a victory when the team batting second wins is a function not only of how many balls it took the team to amass the winning total but also the number of wickets lost in the process. NRR only takes the former into account. This creates the perverse incentive where New Zealand put their win (slightly) at risk by worrying only about how many overs they used and not how many wickets they lost.
The ranking of two or more teams should not depend on which one beat up the most on a team ranked well below them. If, as could easily happen, three teams (say, Australia, New Zealand and Sri Lanka), finish in a tie for first place in their group, the determination on goes through the quarter finals ranked 1st, 2nd, 3rd, should not come down to which team beat Scotland b the biggest margin.

So with these flaws in mind, here is a sequence of proposals to replace NRR with a different tie-breaking rule.

Adjustment 1: To deal with the first problem above, use the average margin of victory/loss rather than NRR: If the team batting second loses, its margin is its score divided by the score required to tie the match. This will be less than 1. The winning team's margin is the reciprocal of this--the target score divided by the chasing team's score. If the team batting second wins, its margin is the number of balls available to it + 1 divided by the number of balls actually used. The losing team's margin is again the reciprocal of this. In the case of a tie, the margin is 1.0 for both teams.

Adjustment 2: To deal with the problem of teams sacrificing wickets for the sake of fast scoring, amend Adjustment 1 in the case where the team batting second wins, by dividing the predicted score at the end of 50 overs by the score required to tie (the implicit score predictor in Duckworth-Lewis would work for this, although I'd prefer to use WASP due to its adjustment to conditions).

Adjustment 3: Make the calculations iteratively. Let there be n teams in a pool. Construct the table at the end of pool play using points scored, and using Adjustments 1 and 2 to rank teams otherwise tied. Then remove the bottom-ranked team and give them a rank of n. Now reconstruct the table using only games played amongst the remaining n-1 teams, and again find the lowest ranked team. Give it rank n-1, remove it and reconstruct the table with the remaining n-2 teams, etc. As an example of how this could be beneficial, imagine that in the current world cup, Sri Lanka beat Australia, Australia beat NZ, and all three beat England and the other three teams except that the game between Australia and Scotland is rained out. Under the system in place for this competition, Sri Lanka and NZ would finish ahead of Australia simply because Australia were denied to opportunity to play Scotland. Under Adjustment 3, the games against Scotland would be irrelevant for deciding the relative ranking of the top three teams. *

Adjustment 4: O.K. now I am getting well out of the realm of feasible rules into the kind of competition we would have if the ICC comprised exclusively economists, but it is fun to speculate. My adjustment 2 still does not properly align incentives because maximising the expected margin of victory is not the same thing as maximising the probability of victory. So instead, let's define the margin of victory in the following way. Draw the WASP-worm graph of the percentage probability of winning for the second innings as a function of the number of balls bowled. This is a graph is contained within a rectangle that has a length of 300 and a height of 100. The value for the team batting second would be the area under the graph divided by the area above it. The value for the team batting first would be the reciprocal. Using this method, it would be possible for the winning team to have a lower score than the losing team, but no matter: this scheme means that the way to maximise your team's tie-break variable would be to maximise your probability of winning.

Adjustment 4 tries to align incentives with the only thing that should ever matter in sport--trying to win--but it doesn't deal with the situations like Australia's go-slow against the West Indies in 1999 (or NZ's go slow against South Africa three year's later that shut Australia out of their own tri-series final). The format used in this year's World Cup does not contain the possibility of such strange incentives, but Adjustment 3 would add that. With an obvious nod to the Gibbard-Satherwaite Theorem and Arrow's Impossibility Theorem, then, let me suggest throwing all of these out the window and instead using the following manipulation-proof tie-breaking formula:

Adjustment 5: Rank all teams leading into the tournament based on recent performances. In the event of two or more teams being tied on points at the conclusion of pool play, their relative ranking will be according to their pre-tournament ranking, fully independent of play during the tournament.

* The ICC might argue that they have addressed the problem in a simpler way by restricting the next tournament to only 10 teams. But the results to date in this World Cup suggest that there will still be some very weak teams and non-competitive matches given the non-competitive process for selecting the 10 teams.

Tuesday, 17 February 2015

Aggressive Opening Batmen in ODIs

After five games in the 2015 cricket World Cup, an interesting pattern is emerging: So far, the average score of the team batting first has been 323 runs (well above the historical average for ODIs), and has gone on to win the match in four of the five matches, and yet in three of the four cases where the team batting first won, it was the losing team that won the toss and sent the opposition in to bat. Captains are likely to see this pattern and adjust their strategy, but I think that would be a mistake.

One of the interesting things me to watch out for going into this World Cup was the approach taken by both the batting and bowling teams in the opening overs of the first innings.It has become conventional wisdom that in the modern game it is crucial for batsmen to be aggressive from the outset. This view is seen and heard in media commentary, and is revealed as strategy in team selections: After After playing down the order last year, Brendan McCullum has returned to the opening spot along side Martin Guptil, as New Zealand empahsise a high-risk, high-return approach to batting from the outset. Australia are opening with two high-risk, high-return batsmen, and England recently dropped their relatively conservative opening batsmen and captain, Alistair Cook and also elevated the more-aggressive Moeen Ali to the top of the order.

But is this conventional wisdom correct? Obviously, faster scoring is better for a batting team than slower, given the same likelihood of losing a wicket, and conversely less risk is better than more for a given strike rate, but what is the trade-off? Peter Miller, aka The Cricket Geek, has expressed the trade-off as follows:

In many ways, getting out for a 15-ball 30 is less of a crime than 65 off 90 deliveries.

This captures the essential difference between test cricket (most of the time) and limited-overs cricket. In the former, time is not a constraint, so the key to a large score is to not be dismissed; 65 contributes more to your team than 30 in most circumstances. In limited overs cricket, every ball faced is a ball that is not available to your teammates. The opportunity cost of balls used up has to be weighed against the runs scored. But is Peter's summary of the trade-off correct? As it happens, there is a measure of the opportunity cost of wickets lost and balls used it up that can answer this question exactly. It is WASP. A player's contribution to his team in the first innings is the amount that WASP advances by on the balls that that batsman faces. Let's imagine that an opening batsman is the first to be dismissed having either scored 30 runs and faced 15 of the first 30 deliveries, or having scored 65 runs and faced 90 of the first 180 deliveries. Which option will have advanced WASP by the larger amount? Well that depends on whether the game is played on a high-scoring or low-scoring pitch. In 250 conditions, the more aggressive opener would have had a net contribution of just under 4 runs compared to just over 11 for the less-aggressive player. In 300 conditions, however, 30 off 15 would give a slightly negative contribution, but 65 off 90 a more-negative one. The cross-over point is when the par score is 278 (roughly).

I reckon that 280 is probably about the average par score for the pitches being played on in this world cup (the succession of first-inning scores over 300 are misleading, as in every game so far, the more-fancied team has batted first), so Peter's example is very finely calibrated but correct.

But there is a seeming inconsistency in the conventional wisdom. As the same time that the consensus is that batters need to be attacking from the outset, media commentary is emphasising the importance of early wickets. And again, this appears to be accepted in team strategies. New Zealand has expressed the intention of attacking from the outset, when bowling as well, being prepared to concede runs in the search for wickets. Australia have their bowling spearheaded by Mitchell Johnson who, it is said, may prove expensive but can also destroy a team with early wickets, and so on. That is, the conventional wisdom is both that opening batsmen have to be aggressive from the start and that it is important for bowling team to secure early wickets. But ifthe risk-return trade-off is such that the benefit of quick runs to the batting team makes it worthwhile taking the risk of early wickets, then the reverse should be true for the bowling team. In the numerical example above, while it is true that 30 off 15 is a better contribution than 65 off 90 on a 300 pitch, both contributions are negative relative to the average opening batsman performance.

Actually, the view that bowling needs to be aggressive at the outset is probably closer to the truth than the view that batting needs to be. At the start of an ODI first innings, the cost of a wicket is between 25 and 30 runs, depending on the conditions. As long as a team has wickets in hand, that cost diminishes steadily as balls are used up, and the trade-off between risk and return favours greater aggression. (Of course, balancing that is the fielding restrictions in the first 10 overs, which lowers the risk from fast scoring.)

If this combination of aggressive batting and aggressive bowling/fielding carries through, in the World Cup, I expect to see a high-variance in first-innings scores: some high scores where the aggressive strategy pays off, and some low ones where rapid wickets impose a large cost. In some cases, when a high first-innings score is achieved, the same strategy will pay off for the chasing team; when a low first-innings score is made, the chasing team should always win by being more conservative. For that reason, notwithstanding the results in the first five games, I still believe that the toss-winner should choose to bat second.* Let's see how it plays out.

* The exception to this rule is when a highly ranked team plays one who is much weaker. In this case, the crazy net-run-rate method for choosing between teams with the same number of wins and losses, implies that the better team should choose to bat first, just to make sure that they bat for the full 50 overs and have more weight on that game in the NRR calculations. But that is the subject for a later post.

Tuesday, 2 September 2014

Are ODI Scores Increasing? UPDATED

I had a conversation with a sports blogger, John Rogers, on Twitter last week. John Rogers had tweeted a link to a blog post he had written on why the WASP projection being used in BSkyB's coverage of limited overs cricket this English summer is necessarily inaccurate. His point is that ODI cricket is evolving quickly, both in the equipment and the style of batting, so that historical data is a poor guide to how many runs you can expect a team to score.

There is always a tradeoff in statistical work between using only the most recent data to capture trends, and using a longer time period to get more statistical significance. Now, in principle, since WASP is calibrated to a par score set by the broadcast commentators, any trend in scoring that has occurred within the period of the data used to estimate the model could be adjusted for in the par score. The setting of a par score is both a strength and weakness of WASP. The strength is that it allows game-specific information to be factored into the projections such as using local knowledge to assess how the pitch is likely to play. The weakness, however, is that the commentators might suffer from the common human biases of seeing patterns in essentially random data, and I wonder if the view that batting power is increasing is an example of that.

So I was interested to see if John's perception of a recent increase in scoring rates due to teams having more "lower-order hitters", better bats, etc. is borne out in the data. There is no doubt that there has been an increase in scoring over time. For example, all of the 16 ODI matches (all involving top-8 countries) where the team batting second has scored 330 or more have occurred this century. Only 5 of those 16, however, occurred this decade, suggesting that maybe the changes are not so recent.

Extreme scores like these are not necessarily indicative of a general trend, so some regression analysis is called for. John's hypothesis seems to be mainly based on increased rates of scoring by lower-order power hitters near the end of the innings. I don't have the full ball-by-ball database to hand, just a record of scores and results, but if the theory is correct, it should show up in total scores. Now WASP is currently based on ODI data from 2006 involving the top-8 teams, so I had a look at all non-rain-shortened games involving those teams from May 1 2006, using a dummy variable for each year starting May 1. First, I looked at the evolution of first innings scores over that time. To control for different abilities across countries, I ran an OLS regression of first-innings score on dummy variables for the team batting first and for the team bowling first, as well as a dummy variable for each of the 8 years in the database. To further control for differences across grounds, I restricted the data set to games played at grounds where there were at least 10 matches played in this period, and included a dummy variable for each ground. This left me with 245 games. The results are shown in by the blue line in the graph below, with the line showing (left axis) the average first innings score for the average team against the average team at the average ground. There clearly has been very little change over these 8 years.

John's blog post, however, seemed to refer specifically to the ability of teams to chase down large scores, so I separately looked at whether there has been a change in the the probability of the team batting second winning using a probit regression. Because differences in grounds largely affect ease of scoring in both innings, and because probabilistic models require more data to get precise estimates, I used the full dataset without dummy variables for the ground, but again controlled for team ability and included dummy variables for each year. The results are shown in red on the same graph (right axis). Probabilistic models typically require a lot more data, and so I wouldn't put too much faith in the estimates for any one year. But there doesn't seem to be a clear recent trend to it being easier to chase down scores than in previous years, although there was a strange dip in the period 2007-2009 that has since been reversed.

I suspect what is happening is a perception bias. There probably has been a recent increase in power hitting as a result of batsmen taking more risks, but that has been balanced by an increase in the rate of dismissals. And this leads to the reality being different from common perceptions. 20-20 has conditioned us to thinking that it is easy to score 8-9 runs an over on small grounds with flattish pitches. And it is. But it requires aerial shots, unlike 5-6 an over, which can be achieved entirely along the ground with 1s and 2s and the occasional bad ball cut or driven for 4 along the ground too fast for the cover sweeper to collect. With modern bats and batting, it is not difficult to sustain 8-9 an over through regular sixes and lofted 4s, but it is hard to do so without losing regular wickets. But wickets arrive randomly. Now think of the commentators bias. If a batsman hits a clean six, he is lauded for his good shot. If he mistimes it and is caught, as often as not he will be criticised for "taking unnecessary risk" or "not waiting for the right ball". (Have you ever heard a commentator criticise a batsman for taking unnecessary risk after making a clean hit for 6?) This creates an impression that the good shots are normal, and the wickets are just an avoidable failure rather than both being natural consequences of a particular level of aggression. Combine that with our recollections of past matches. Sometimes a team chasing 120 off 72 balls will have a randomly good passage scoring at that rate without any lofted shots going to hand, and it will make it look like such fast scoring is easy. At other times, we will see a procession of wickets and we will be thinking how the batsmen threw the game away. It is the first case that sticks in our mind when we make our own assessment of probabilities, and so we inflate in our own minds what the probabilities of winning are when a team is chasing a large total.

Data (even historical data that may become out of date) is a good antitdote to these perception biases.

UPDATE: Chris Smith, of the wonderful cricket blog, Declaration Game, asks by tweet if the results would have been very different had I not restricted to top-8 countries, and controlled for team ability and ground. That is, would we observe a general increase in scoring, but one attributable to having more games with weak teams, and more smaller grounds. Rerunning the numbers on the first-innings scores, if we include Bangladesh, Zimbabwe, Ireland and Afghanistan in the data (I don't have other countries in my database), and don't control for team ability, and don't control for grounds, we see a 10-run increase between the periods 2002-2007 to 2008-2013. Removing the four weaker teams and controlling for team ability only reduces that change by 2 runs. The big change comes when we restrict the data to grounds with at least 10 games in the dataset (still 540 games) and control for the ground. This reduces the change down to 2 runs.

Friday, 13 June 2014

Irrational Expectations in Cricket Redux

This post is in part a follow-up post to this one from 2012 about irrational expectations in cricket, but is more a response to some recent twitter activity in the U.K. BskyB have been using WASP in their coverage of the recent ODI and 20-20 series between England and Sri Lanka, and this has provoked some angry twitter comments. Defenders like David Lloyd

Folk who understand cricket don't understand WASP. Folk who don't understand cricket understand WASP .. "Who's winning?"
— David 'Bumble' Lloyd (@BumbleCricket) May 28, 2014

or Adam Lewis in this post, point out that a metric like WASP can be very useful for newcomers to watching cricket to give a sense of who is winning at any particular time and how comprehensively. The idea behind Adam’s post is that WASP tells cricket newcomers what experienced watchers already know in their gut. But just how good is the gut of experienced watchers? Well that is hard to measure, but I think it is reasonable to assume that highly paid captains of international teams probably have at least as good an intuition from the game from being actively involved. So let’s look at a very simple decision that captains have to make: whether to bat first or second on winning the toss.

I am currently working on a project with a student from India, Pranav Bhargava, to estimate rankings of teams. In the process we came across the following interesting result: A model that estimates the probability that the team batting second would win an ODI as a function of the quality of the two teams playing, fits the data better than one that estimates the probabiliyt that the team who wins the toss wins the game. Looking at the raw data, we find that the team batting second won 53% of the 1294 games played between May 2002 and May 2014, but the team winning the toss won only 51%. This is a small difference but it is masked the fact that the best team over this period, Australia, batted first more often. When controlling for team ability, the difference is more marked.

This makes no sense at all. While the team batting second wins slightly more often than the team batting first, indicating a second-innings advantage on average, the advantage will not apply in every game, depending on the pitch and the abilities of the teams playing. The captain who wins the toss has the option of choosing to always bat second, or to choose to bat first if these game-specific factors suggest that would be better. Accordingly, the team winning the toss should win more often than the team batting second.

O.K. so let’s give the captains the benefit of the doubt. It seems unlikely with such a large sample, but maybe the random toss has, by chance, been won by the weaker team more often than the stronger team. So we investigated this further. We measired separate team ability measures for each of the top 11 countries (the top 8 + Bangladesh, Zimbabwe, and Ireland) for when they won the toss and lost the toss, and found that for some matchups, losing the toss would be preferable to winning it! In particular, three teams—Australia, Pakistan, and Zimbabwe—make the wrong decision according to the data more than 50% of the time, and so would prefer to lose the toss if playing against a clone of themselves. The remaining teams make the right decision more than 50% of the time, but most are sufficiently imperfect that if playing against Australia or Zimbabwe, would be better off losing the toss and relying on the opposition to make the wrong decision! Only Ireland out of the top 11 teams has a decision record that makes it desirable for them to win the toss against any opposition.

So far, these results replicates results in Bhaskar (2007), but with a slightly different method, suggesting that the results are robust. One criticism of both sets of results, however, is that in using the full sample of games to estimate what should be the correct decision, we are using information from matches that would not have been played at the time captains made their decisions. So we divided the data into two eras of 647 matches each. We used the first era to estimate when it would be better to bat first rather than second, and then used this to compare outcomes to predictions in the second era. We find that teams win more than predicted when captains make the right decision and less than predicted when they make the wrong decision. Put another way, the variable on “correct decision”, is strongly and positively significant in a regression modelling the probability of success. And this uses only information on how well teams have played batting first and second in the first 6 years of the data to predict outcomes in the second 6 years. Real-world captains have more up-to-date information about how teams are playing as well as information about ground conditions on the day.

At this point, I can’t see any comeback. The information available to our model is strictly less than that available to captains, yet our model can outperform international ODI captains quite significantly.

So what is going on? I think there are likely two sources of imperfect understanding by captains at play here. The first is that captains forget that this is a zero-sum game. If you are a team that is better at chasing than setting a score, but are playing against a team that is much better at setting than chasing, the optimal decision is to bat first, holding conditions equal. But teams possibly play to their own strengths rather than also considering their opponents weaknesses. Another possibility, that I suggested in the earlier post, is a misunderstanding of the regression fallacy: on average, the easier the batting conditions, the higher is the first-innings score. And, on average, the higher the first-innings score, the higher is the probability that the team batting first wins the game, since, on average, higher first innings scores indicate a better than average batting performance. But these two facts don’t in themselves imply that the team batting first has a higher chance of winning when batting conditions are easy.

There are other stories one can tell for the source of the errors made by captains, and we are investigating whether we see in the data what the source is. But the bottom line is that careful data analysis with limited information outperforms professional gut opinion with full information, and by a considerable degree!

Tuesday, 27 May 2014

FAQ on the WASP

[Update January, 2015: This post has been updated a bit for fans coming to it during the current NZ v SL ODI series.]

This post is written primarily for those cricket fans coming to Offsetting as a result of the WASP being used on BSkyB in Britain to cover the current series between England and Sri Lanka. As with the coverage in New Zealand, it has generated some reaction on Twitter. The aim of this post is to enhance the discussion by addressing some common misconceptions.

What does WASP stand for?

Winning And Score Predictor

How did WASP come about?

WASP was developed by Dr Scott Brooker and his Ph.D. supervisor Dr Seamus Hogan while Scott was studying for an Economics Ph.D. at the University of Canterbury.

What forms of cricket is it applicable to?

Limited Overs Cricket (including One-Day and T20 matches)

What does the WASP number mean in the first innings?

The score that the batting team will end up with at the end of their innings if both teams play averagely from this point on?

What do you mean by "play averagely"?

The model is based on the average performance of a top-eight batting team against a top-eight bowling/fielding team.

What does the WASP percentage mean in the second innings?

The probability that the team batting second will win the match, assuming that an average top-eight batting team is playing against an average top-eight bowling/fielding team.

What does the Par Score mean?

This is a measure of the ground conditions (including pitch, outfield, overhead conditions and boundary size) that exist on the day of the match. It is the average number of runs that the average top-eight team batting line-up would score in the first innings when batting against the average top-eight bowling/fielding line-up. It is decided by expert opinion (often the commentary team or the statistician looking at the history of the ground).

Why have a Par Score?

The average score in an ODI is around 250. Without a Par Score, WASP would predict 250 at the start of the first innings every time. With the Par Score, WASP may start off predicting 200 if batting conditions are very difficult or 300 if they are very easy. The Par Score method is not perfect as it is subjective, but it is a big improvement on simply assuming that all games are played in the same conditions.

Why does WASP change almost every ball?

That's exactly what it is designed to do. Every ball, whether the outcome is a six, a dot or a wicket, affects the likely outcome of the innings and WASP adjusts every time it receives new information with each ball.

If WASP keeps changing its mind, what is the point of it?

It is a measure of how the game is going and allows the viewer to follow the change in projected final total or probability of winning as the performance of the two teams ebbs and flows. It is a measure of which team is winning and by how much at any point of time. It also provides a way of assessing whether a partnership that is progressing slowly without undue risk is contributing to its team's probability of success (by steadily increasing the WASP number) or simply creating pressure for later batsmen.

WASP predicted that my team had just a 1% chance of winning at one stage. They won. Was WASP wrong?

Probably not. We cannot tell by simply looking at a single match. A 1% chance means that WASP expects that the chasing team will win from that position one time in a hundred attempts. You may have just witnessed that one time. If teams regularly come back from a 1% chance of winning to win the match, then we would be worried about the prediction, but not in the occasional match. In any sport, teams very occasionally make a miraculous comeback and win from a seemingly impossible position. What this means is that they won from a situation where their probability of winning was extremely low. It does happen, but not very often, which is what makes those games to memorable.

Does WASP consider the skill levels or form of the individual players or teams?

No. WASP predicts based on the average top-eight team against the average top-eight team. If the number-one-ranked team is batting against an associate nation, it is likely that they will do quite a bit better than the WASP projection. The same applies to a player in a very good or very poor run of form. Consider the WASP as a benchmark or starting point, and then adjust up or down based on your view of the relative strengths of the teams and players.

Because WASP doesn't take into account the particular players, it provides a measure of how well particular players are doing. For example, imagine that WASP was sitting at 225 in the first innings when a partnership between, say, Sangakkara and Matthews begins, and has risen to 300 by the time the partnership ends. That doesn't mean that WASP was wrong; rather it is a measure of how well those two batsmen have played (or alternatively, how poorly the opposition team have bowled and fielded).

So WASP thinks that all players in top-eight teams are the same?

No. WASP knows that opening batsmen are on average different to number 11 batsmen. But it doesn't know that Chris Gayle is not the same kind of opening batsman as Alistair Cook.

Why does the WASP probability differ from bookmaker odds?

There are a variety of reasons for this. The relative strengths of the teams, the subjective opinions of the bookmakers and the balance of their book could all play a role. WASP is a measure of who is winning rather than who is more likely to win. For example, if Australia is playing Ireland, and Ireland gets off to a great start, then Ireland are winning at that time but Australia will still be more likely to win.

Then what is the point of it?

There are a number of sports where a snapshot of the scores at a single point in time does not provide a good indication of who has performed better, and so it is customary to provide more information. In tennis, it is not enough to report that a player is leading 4-3 in a set; we also need to know if the games have gone with serve or the if the player who has won 4 games is up a break. In golf, the total number of strokes is never reported for players who haven't finished the round; instead, score updates report the players' deviation from par. That essentially assumes that the player will achieve the par (mode) score on each remaining hole. WASP is a similar reporting of deviation from par.

I want to follow how my team is doing; what should I be looking for?

A batting team's likely score in the first innings or their probability of reaching the required target in the second innings always falls when they lose a wicket. This cost is typically much higher at the start of an innings than later on, when wickets in hand may not be so important. To allow for the inevitable fall in WASP when losing a wicket, a batting team should be looking to see WASP steadily increase during a partnership. If the bowling team can keep the increases in WASP low during each partnership and still take regular wickets, it can come out on top in the innings.

Could WASP be an alternative to Duckworth-Lewis for adjusting targets in rain-affected matches.
Yes it could. Sometimes, WASP will give a very different answer, partly because of its using a different criterion for fairness and partly details in the implementation, including it having an adjustment for the ease-of-batting conditions. Most of the time, however, it will produce a similar outcome. For example, in the first ODI between Sri Lanka and England, D/L initially set Sri Lanka a target of 259 as a result of the earlier rain interruption; WASP would have set 262. After the second rain interruption, D/L revised the target to 226; WASP would have set 220.

What if I have more questions?

Comment below, or ask us on twitter @srbrooker and @seamus_hogan.

Tuesday, 18 February 2014

McCullum's 300 and WASP

UC's media consultant loves cricket and so asked me if I could do something WASPish about the probability of McCullum scoring 300 runs in an innings. The resulting media release is here . (The request came before the start of play today when McCullum was already on 281, but I looked at it from the point of view of the start of his innings.) The guts of the conclusion was that you would only expect McCullum to reach 300 once every 4310 innings if he scored with his career average strike rate and dismissal rate, and that is before taking into account how unlikely it would be that the other batsmen around him would survive long enough to not leave him stranded short of the target.

This was just a bit of fun for a media release. If I was to do the job properly (unlikely, since it is not the sort of thing that results in a peer-reviewed publication), this is what I would like to do:

Do a Kaplan-Meier estimate of hazard rates out of batting as a function of runs scored, rather than assume that a constant strike-rate and poisson dismissal rate apply at all times.
Do a monte-carlo simulation using McCullum's Kaplan-Meier estimates and other players' dismissal rates to find a combined probability of McCullum and at least one other player surviving until McCullum had scored 300 runs.
Try to estimate a joint distribution of survival rates and strike rates across games to take into account statistical dependence.

Above all, I would love to work out a historically applied test cricket WASP in order to rank past innings not only by size, but importance in game context. What are the innings that have most changed the expected outcome of a match (2 points for a win, 1 for a draw, 0 for a loss). So many of the very high innings have been in matches that were clearly headed for a draw on a lifeless pitch, or where one team was totally dominant. The really great innings--Laxman (2001), Lara (2003?), Jessop (1902), Botham (1981) Astle (2002) if we had won--are the ones that change the course of the match.

I don't think this one qualifies for the "future Honours projects" label. Perhaps I need a label "if we had world enough and time".

Friday, 1 November 2013

More Cricket: The Return of the Wasp

Sky is starting its Friday night coverage of the HRV cup (the domestic 20-20 cricket competition) tonight and will again be using the WASP graphic to monitor team's progress throughout the match.

There was a lot of traffic coming into Offsetting last year whenever a game using the WASP was playing on Sky, so I thought I should link back to my original post explaining how it works. Also, let me reiterate a few points that I have made in comments elsewhere.

WASP is a way of calculating who is winning, rather than a prediction of who will win. By example, imagine the All Blacks were trailing Ireland 10-9 at halftime in a rugby test. Ireland are clearly winning (they have more points), but the smart money would still be on the All Blacks to win the match based past performance of the two teams. Similarly with WASP, it calculates the expected score in the first innings, and the probability of winning in the second innings if the average batting team were playing the average bowling team on that pitch. A team may have got its nose in front based on this measure, but still be expected to lose based on an assessment of its overall ability compared to the opposition.
The WASP prediction depends on an input of the par score for the conditions (pitch, ground size, etc.) in which the game is being played. If Sky use the same graphics as last year, you can infer what Sky has chosen as the par score when they put up the WASP worm showing how the WASP prediction has evolved through the innings. The assumed par score is the number that WASP started at when 0 balls had been bowled.
In the second innings, Sky round off the probability of the batting team winning to an integer percentage, so values of 0 and 100 are possible even if the game hasn't actually been won yet. (Again, this is based on what happened last year; I assume it will be the same this year.)
Things can be messed up a bit if a batsman retires hurt and may or may not return to the crease. This happened in this NZ v England match last year when Guptill retired hurt and then returned late in the innings to deliver a staggering performance for a #9 batsman!

There is a new feature this year, which is a graphic showing what would need to happen in the next two overs (five overs a 50-over game) to bring the match back into 50-50 balance--that is, to bring the predicted score to par in the first innings or the probability of winning to 50% in the second innings.

Thursday, 22 November 2012

Cricket and the Wasp: Shameless self promotion (Wonkish).

[UPDATE: January 2015. The post below dates from November 2012 when New Zealand's Sky TV first introduced the WASP in coverage of domestic limited overs cricket. For fans coming here as a result of its being used in the current NZ v SL series, please see here for an FAQ. For an explanation of what cricket has to do with Economics, see here; and for all the cricket posts on Offsetting Behaviour, see here.]

In their coverage of the Wellington-Auckland game in the HRV cup last Friday, Sky Sport introduced WASP—the “winning and score predictor” for use in limited-overs games, either 50-over or 20-20 format. In the first innings, the WASP gives a predicted score. In the second innings, it gives a probability of the batting team winning the match.

I am very happy about this as it is based on research by my former doctoral student, Scott Brooker, and me. Not surprisingly, the commentators didn’t go into any details about the way the predictions are calculated, so I thought I would explain the inner workings in a wonkish blog post.

The first thing to note is that the predictions are not forecasts that could be used to set TAB betting odds. Rather they are estimates about how well the average batting team would do against the average bowling team in the conditions under which the game is being played given the current state of the game. That is, the "predictions" are more a measure of how well the teams have done to that point, rather than forecasts of how well they will do from that point on. As an example, imagine that Zimbabwe were playing Australia and halfway through the second innings had done well enough to have their noses in front. WASP might give a winning probability for Zimbabwe of 55%, but, based on past performance, one would still favour Australia to win the game. That prediction, however, would be using prior information about the ability of the teams, and so is not interesting as a statement about how a specific match is unfolding. Also, the winning probabilities are rounded off to the nearest integer, so WASP will likely show a probability of winning of either 0% or 100% before the game actually finishes, even though the result is not literally certain at that point.

The models are based on a database of all non-shortened ODI and 20-20 games played between top-eight countries since late 2006 (slightly further back for 20-20 games). The first-innings model estimates the additional runs likely to be scored as a function of the number of balls and wickets remaining. The second innings model estimates the probability of winning as a function of balls and wickets remaining, runs scored to date, and the target score.

The estimates are constructed from a dynamic programme rather than just fitting curves through the data. To illustrate, in the first innings model to calculate the expected additional runs when a given number of balls and wickets remain, we could just average the additional runs scored in all matches when that situation arose. This would work fine for situations that have arisen a lot such as 1 wicket down after 10 overs, or 5 wickets down after 40 overs, etc.), but for rare situations like 5 wickets down after 10 overs or 1 wicket down after 40 it would be problematic, partly because of a lack of precision when sample sizes are small but more importantly because those rare situations will be overpopulated with games where there was a mismatch in skills between the two teams. Instead, what we do is estimate the expected runs and the probability of a wicket falling on the next ball only. Let V(b,w) be the expected additional runs for the rest of the innings when b (legitimate) balls have been bowled and w wickets have been lost, and let r(b,w) and p(b,w) be, respectively, the estimated expected runs and the probability of a wicket on the next ball in that situation. We can then write

V(b,w) =r(b,w) +p(b,w) V(b+1,w+1) +(1-p(b,w)))V(b+1,w)

Since V(b*,w)=0 where b* equals the maximum number of legitimate deliveries allowed in the innings (300 in a 50 over game), we can solve the model backwards. This means that the estimates for V(b,w) in rare situations depends only slightly on the estimated runs and probability of a wicket on that ball, and mostly on the values of V(b+1,w) and V(b+1,w+1), which will be mostly determined by thick data points. The second innings model is a bit more complicated, but uses essentially the same logic.

Now many authors have applied dynamic programming to analyse sporting events including limited overs cricket (see my previous post on this here), although I don’t know of any previous uses of such models in providing real-time information to the viewing public. Scott’s and my main contribution, however, is in including in our models an adjustment for the ease of batting conditions. I have previously blogged about our model for estimating ground conditions, here. Without that adjustment, the models would overstate the advantage or disadvantage a team would have if they made a good or bad start, respectively, since those occurrences in the data would be correlated with ground conditions that apply to both teams. Using a novel technique we have developed, we have been able to estimate ground conditions from historical games and so control for that confounding effect in our estimated models.

In the games on Sky, a judgement is made on what the average first innings score would be for the average batting team playing the average bowling team in those conditions, and the models’ predictions are normalised around this information. At this stage, I believe this judgement is just a recent historical average for that ground, but the method of determining par may evolve.

I gather that the intention is to unveil more graphics around the use of WASP throughout the season, with the system fully up and running by the time of the international matches against England. It’s going to be interesting listening to what the commentators make of the WASP. Last Friday’s game wasn’t the best showcase, since when Auckland came to bat in the second innings, their probability of winning was already at 92% and quickly rose higher. It was fun, though, hearing the commentators ask Wellington captain, Grant Elliot, who was wired for sound while fielding, what he thought their chances were given that WASP had the Auckalnd Aces at 96% at that point. Grant's reply was lovely: "Sometimes even pocket aces lose". This is worth remembering when (as will inevitably happen), a team has a probability of winning in the 90s but still goes on to lose.