Saturday 26 July 2014

How unfair is the Super 15 schedule?

In advance of the semi-final between the Crusaders and the Sharks this evening, it is timely to look at the fairness of the Super 15 schedule. The Crusaders are playing at home, a massive advantage that they earned by virtue of finishing one point ahead of the Sharks in the regular season. But was that a fair reflection of the two teams? 

The Super 15 rugby competition is a bit unusual in its unbalance. There are five teams from each of three countries. Each team plays the other four teams in its country twice, home and away; it plays four of the five teams from each of the other two countries once, two games at home and two away; and it doesn’t play the remaining two teams at all. This leads to three reasons why a schedule may favour some teams over others: First, teams from stronger countries have to play more games against each other making it harder for the best teams from those countries to finish ahead of the best teams from weaker countries; second, a team is favoured if the two teams it doesn’t have to play are relatively weak strong; and third, for the best teams, there is an advantage to playing the stronger teams from other countries at home to get the benefit of home-field advantage, and play away against weaker teams who can be expected to lose in any location. 

Mark Reason recently had an article in the Dominion Post suggesting that these factors led the Crusaders (who finished the regular season in second place overall) to have been favoured in this year’s competition and to have penalised the Hurricanes (who finished seventh and out of the playoffs) . His logic seemed impeccable to me; certainly it seemed that the Crusaders benefited from the luck of the draw this year relative to recent years when they had to play the best South African teams in South Africa.

I am currently doing some research constructing rankings for international cricket, and thought it would be fun to use the same method to infer how teams would have finished in the Super 15 had they had a balanced schedule. Kirdan Lees has beaten me to it, in a welcome new blog: Sport Loves Data. Kirdan has reevaluated the ranking of the 15 teams, taking into account the imbalance in the schedule, and has posted his results here. Given that Kirdan’s method is very different from mine, I decided to see how the two methods would compare. The table below gives the actual points table, and my revised points table adjusted for schedule unfairness. (The TL;DR explanation of my method is detailed at the bottom of this post.)

Team
Actual
Predicted
Waratahs
58
58.5
Crusaders
51
50.1
Sharks
50
52.4
Brumbies
45
45.0
Chiefs
44
42.6
Highlanders
42
37.7
Hurricanes
41
40.7
Western Force
40
40.2
Bulls
38
38.3
Blues
37
38.1
Stormers
32
33.4
Lions
31
33.8
Reds
28
24.5
Cheetahs
24
23.9
Rebels
21
23.1

Kirdan's method gives rankings rather than points, so the following table shows just the assumed finishing position: 

Team
Actual
Predicted
Kirdan
Waratahs
1
1
1
Crusaders
2
3
2
Sharks
3
2
4
Brumbies
4
4
3
Chiefs
5
5
6
Highlanders
6
10
7
Hurricanes
7
6
5
Western Force
8
7
11=
Bulls
9
8
9
Blues
10
9
10
Stormers
11
12
8
Lions
12
11
11=
Reds
13
13
14
Cheetahs
14
14
13
Rebels
15
15
15

The interesting thing is that my and Mark Reason’s intuition about how much the Crusaders were favoured this year turns out to have been overblown, although the method does result in my having the Crusader’s ranked just behind the Sharks rather than slightly ahead. And yes, the Hurricanes would have qualified for the playoffs as one of the top six teams using my or Kirdan's rankings, but using my method the reason is not that the method pushed them up but rather that the big mover was the Highlanders, who appear to have been hugely favoured by the schedule this year. 


Postscript: Kirdan has another post looking at home field advantage in the Super 15. My probit regression method, would require a lot more data to analyse team-specific home field advantage, but in a model which assumes that the advantage is constant across teams, the result is that home-field matters so much in the Super 15 that, in a match between two teams of equal ability, the one playing at home has a 75% chance of winning. It is no surprise that the Super rugby competition has almost always been won by the team that finished first in the regular season, and who therefore are not only likely the strongest team, but also earn home-field advantage throughout the playoffs. 

TL;DR Explanation of Method: 
  • There are two separate LHS variables, each estimated by an ordered probit regression: table points scored by home team, table points scored by away team. Each can take the values 0, 1, 2, 3, 4, 5. 
  • My database only included the scores, not the bonus points scored. The actual points earned by each team for winning, tying, or losing by 7 points or less, can be inferred from the scores, but not bonus points for scoring 4 tries or more. I proxied this by assigning a bonus point if the team scored 30 points or more. The method proceeds as follows: 
  1. Generate a dummy for each of the 15 teams that equals 1 if that team was the home team, and -1 if it was the away team.
  2. Run two ordered probits, one for points scored by the home team, and one for points scored by the away team, in each case run on the 15 dummies (one dropped) and a constant. 
  3. Predict the probability of scoring 0,1,2,3,4,5 points for each of the 210 possible matchups (each team playing each other home or away), and found the expected points for each.
  4. Then sum these to get the total points in a balanced competition where every team plays every other twice, home and away.
  5. Finally, normalise these by a linear transformation to get the same mean and s.d. as the actual super 15 points table.

14 comments:

  1. You say a team is favoured if the teams it doesn't have to play are relatively weak. Does the point ranking then count a win against a weak team for much less than a win against an average team?

    ReplyDelete
  2. Crusader’s ranked behind the Sharks. Crusaders 38, Sharks 6. Say no more.

    ReplyDelete
  3. Dear economists,


    If I have a statistics blog where a substantial fraction of the traffic is driven by rugby, and there is an increase in the number of other NZ blogs posting intelligent data-based things about rugby, am I correct in thinking this is a Bad Thing for me?


    Are there recognised strategies for encouraging market failure so as to reduce competition? Do any of them work without lots of money? Or am I allowed to just think of the benefits to society as a whole?


    Yours sincerely,


    Statistically Troubled And Tending Slightly to Concerned Hypochondria About Traffic

    ReplyDelete
  4. Hmm. I realised my prior reply potentially broke your careful anonymity and so I have deleted it.


    I note that the NZ econ blogosphere does much better now that it's more than just TVHE, AntiDismal and us.

    ReplyDelete
  5. Does it not depend on whether the other blogs are a complement or substitute for your blog?

    ReplyDelete
  6. Dear STATSCHAT,


    Naturally, a naive economist like me would think only of the social good. But a public choice expert like Eric might be inclined to suggest that the best course of action would be to encourage the government to introduce a licencing regime such that it would be prohibitively expensive for outsiders like me to enter into your space!


    Best of all, however, would be to note that there are positive spillover benefits in blogging, so that more arrivals in this space is not necessarily a bad thing. Actually, I genuinely had intended to reference the rugby rankings on a blog coincidentally called Statschat. Their and my rankings are asking a different question and hence use a different method but do complement each other.

    ReplyDelete
  7. See my point about home-field advantage. My model had Sharks ranked ahead of the Crusaders in a balanced competition, but the Crusaders heavily favoured on Saturday night, simply because they were playing at home. And that was without the model estimating the additional home-field advantage that comes when the opposition has had to travel between South Africa and New Zealand

    ReplyDelete
  8. No the point ranking doesn't have make a distinction. But home field advantage will have a bigger impact on the probability of winning, the closer the two teams are in ranking. It is to a good team's advantage to reduce its chance of winning against a weak team from 95% to 90% while increasing its chance against a strong team from 25% to 75%, simply by changing which one it plays at home.

    ReplyDelete
  9. It would be interesting to split that home-field advantage into a) the amount that the home team plays better, and b) the amount that the referee favours the home team. It could probably be done by looking at the key stats and isolating the impact of penalties on the final result, then seeing if home teams receive more penalties than the other stats would suggest that they should.

    ReplyDelete
  10. But Kirdan tells us

    "And the Crusaders? Don’t put down your house on them winning at home. Home form has been woeful and the Sharks classy away "



    So the Crusaders don't have much of a home-field advantage. So that can't be the story.

    ReplyDelete
  11. You need to explain to me slow-like why it's better not to have racked up more wins by playing the real easy teams.

    ReplyDelete
  12. That's not the right thought experiment. Yes, it is better to have a schedule that has you playing weaker teams. That is one of the sources of imbalance. But now, let's say that you are a top NZ team and it is given that you will be playing, say, both the Warratahs (Australia's top team) and the Rebels (their weakest team), one at home and one away. Which one would you rather have your home game against? Now think probit. Gaining home field advantage pushes you a distance to the right on the latent variable axis (horizontal), losing it pushes you the same distance to the left. If you are near the mean (which is the case when two teams are of roughly equal strength), the slope of the cumulative normal is high, so the change in probability from a given change in latent variable is high; when you far from the mean (one team is much stronger than the other), the change in the probability is small. So you want the evenly matched game at home and the mismatch on the road.

    ReplyDelete
  13. So I should have read "second, a team is favoured if the two teams it doesn’t have to play are relatively weak" to mean "have to play AT HOME are relatively weak"?

    ReplyDelete
  14. Ahhrgh. I completely misread your original question. Now I see the typo in my original post. Now corrected.

    ReplyDelete