Tuesday 27 July 2010

Dynamic Programming in Sport

The dialogue in the comments to my post on red and yellow cards in rugby is the motivation for this rather geeky post on the application of dynamic programming in sport.

Dynamic programming is a wonderful mathematical tool, used a lot in economics, finance, operations research, and many other applications. In essence, it deals with situations where the value of an action can come partly from a direct payoff to that action and partly from putting oneself in a better position to gain a payoff in the future. In DP analysis, the expected value of being in a particular position (called a "state") is equal to the expected immediate payoff from taking the optimal action that state plus the expected value of being in a new state as a result of the action. The value of any state is then a function of the values in other states, which can be solved for either by working backwards in time from a well-defined end, or by a set of simultaneous equations.

DP analysis is particularly well suited to the analysis of sport, which contains many examples where the objective on any particular play is not to gain a payoff (i.e. score points) but to put oneself in a better position to score points later on.

Among many examples, DP analysis has been used to analyse whether one should kick away possession on 4th down in American footoball; to model the bevahiour of tennis players in choosing the optimal amount of agressiveness on first serves compared to second serves; and by my doctoral student, Scott Brooker, to model the trade-off between fast scoring and wicket preservation in ODI cricket. The analysis of the first of these examples was famously used by football coach and economics graduate, Bill Belichick, helping him to win three superbowls with the New England Patriots. (HT: Mankiw.)

The principles of dynamic programming are naturally understood intuitively by sportsmen, albeit with consistent sub-optimality in their applications in some contexts. It is implicit in the old rugby adage to never pass to a player in a worse position than oneself. And I believe that the famous (or notorious) long-ball game that Graeme Taylor brought to the Watford soccer team in the 1970s was motivated by the obeservation that almost all goals in English club games were socred within a very small number of passes of the team gaining posession. In DP terms, this translates as the value of having possession at your end of the field if you play a short-passing game is less than the value of the opposition having posession at its end, so you might as well just kick for territory not posession.

So what is the relevance of dynamic programming to my suggested rugby rule change?

I suggested a rule change that would increase the value of having posession inside the opposition 22, since the payoff to taking a shot at goal from a penalty would increase. The comments were then that the rule change would have no effect on infringing further out from the goal line, since there is no feasible option to take a shot at goal. This is not correct, however. Let's say you are awarded a penalty on the halfway line, and don't have a kicker you would trust to kick for goal from there. The best option with and without the rule change would be to kick for touch from which there is a roughly 90% chance you will regain possession at the line-out. But that doesn't mean that a penalty has the same benefit as before. Let's say a kick for touch cna advance your team 25 metres, so a penalty at halfway would earn you almost certain posession around the opposition 22. If the cost of infringing is raised inside your own 22, then the valuye of holding posession is increased, and hence so is the value of a penatly on halfway. Similarly, the value of a penalty on your own 22 is increased (albeit with diminishing returns) as it enables a team to get almost-certain posession at the half-way line, etc.

Maybe they should be teaching dynamic programming at our sports academies?


  1. I'm not sure I see how the value of a penalty is increased outside goal range. Don't both rule sets (current and proposed) result in a kick for touch followed by a lineout? Call me crazy, but without a card system the cost to the infringing team of giving away a penalty is less outside goal range, as the current outcome applies, but without the risk of a player being sin-binned for 10 minutes. Or have I missed something?

  2. @Lats

    That is exactly the point about dynamic programming. Let's say we are comparing two sets of rules, the current rules but without red and yellow cards, and my proposed changes.

    We are agree that the changes make infringement within your own 22 less profitable. That means that having posession within the opposition 22 has a higher expected value. So now, for an infringement on the half-way line, although the outcome is the same under either set of rules--a kick for touch; the value of that outcome to the kicking team is higher since the value of having posession inside the oppositon 22 is higher. Thus, there is less incentive to give away that penalty.

    Now I agree that the cost of the penalty is lower the further one is away from the goal line, but so is the benefit from infringement. (The referee said as much last weekend, in explaining why McCaw was not carded for his 4th penalty--his infringements were not inside his own 22 and so not so deserving of sanction).

    To give you another example, why have the new tackle interpretations this year heavily reudced the amount of aerial ping pong being played? Well obviously, the benefit of running has increased and so teams take that option rather than kicking. But even if you were a team (South Africa?) who could not successfully run and retain possession even under the new rules, you would still have an increased incentive to do so this year. Why? because although running would still have the same value for you in my thought experiment as previously, and kicking would achieve the same outcome--opposition posession deep in their territory--the value to the opposition from having posession is now higher as they are now able to run. Therefore, even in my extreme thought experiment where the rule changes benefit only one team when running the ball back, the effect would be to reduce kicking by both teams.

  3. Yep, I get it now. I wasn't including the follow-on advantage of territorial gain, i.e. was discounting future gains resulting from infringing by the defending team closer to their own goal line. Instead I had made the mistake of treating each penalty occurrence as a separate, unrelated mini-slice of the game, so I wasn't seeing the cumulative gains to be had.
    You still haven't convinced me that the game would be better off without cards, but I do like the idea of being able to replace carded players, effectively forcing an early substitution. This might require raising the threshold for cards being given, but for minor-ish penalties that would previously have been yellows the ref could always march the offending team back 10 yards, as occasionally happens when they backchat him.