Flippant Accusations

It seems to have begun with a story in the Boston Globe mentioning that the New England Patriots won the coin toss at the start of their football game, 19 out of the last 25 times. The story just pointed out that the Pats had been lucky in the coin-toss department, and discussed their strategy when they do so.

But CBS Sports decided to call it an “impossible clip.” The insinuation of cheating was evident. It didn’t take long for NESN and Boston.com to jump on the accusation bandwagon. Hell, the innuendo has even spread to the Charlotte Observer.

These stories only prove two things. 1: Prejudice — meaning “pre-judice,” i.e. judge first, investigate later; 2: When it comest to statistics, a little bit of knowledge is a dangerous thing.

The “impossible rate” idea is based on the fact that if you flip a fair coin 25 times, your chance of getting 19 or more “heads” (or tails if that’s what you like to call) is 0.00731665. It’s a straightforward application of the binomial distribution, something those with a little knowledge can do. Since that’s about one chance in 137, let’s start spreading rumors with made-up names like “flip-gate” and “coin-gate” — they must be cheating, right?

Wrong. As someone who has a lot of knowledge about statistics, I can tell you there are a bunch of problems with this “analysis.”

Let’s start with the fact that 1 out of 137 is a far cry — a very, very, very far cry — from “next to impossible.” Some journalist making that kind of exaggeration isn’t a far cry either — it’s par for the course.

Let’s mention that there are 32 teams in the NFL. If one of them flips a coin 26 times, the chance of 19 or more “heads” (or tails if that’s what you prefer) is about 1 out of 137. If all 32 of them flip a coin 26 times each, what are the odds that at least one team will get 19 or more? A helluva lot greater than 1 out of 137.

The biggest problem of all is: cherry picking. I have no blame to lay, or fault to find, with Jim McBride at the Boston Globe, he was just pointing out a streak of good luck. But the other idiots ran with it, without even thinking about something that’s kinda obvious to those of us who know a lot about statistics. Namely, this: that when someone says “19 out of the last 25” it’s overwhelmingly likely that the 26th was not.

Chances are, McBride picked 25 because that was a run of good luck. But if the Patriots had also won the 26th, he’d have talked about 20 out of 26, not 19 out of 25. He picked 25 because it was the run of good luck. Statistically, when you choose your sample because of the result it gives, it’s called “cherry-picking.” The salient point is that it throws off the statistics.

That’s an issue I’ve discussed often, in relation to climate data. Climate deniers do it all the time, for the purpose of giving the wrong impression. Because it does.

I doubt the Boston Globe article was trying to give the wrong impression. Nor did they; the Patriots have indeed had a run of good luck. But those others ignored the hard part of statistics, probably because they don’t really know what they’re doing. They saw a chance to impugn the New England Patriots, and they jumped on it.

Back in 2011, the New York Times mentioned in an article that 11 games into the season, the Cleveland Browns had lost all 11 coin flips. They also pointed out that the probability of that is a mere 1 out of 2048. But to their credit, they did not have the temerity to accuse the rest of the NFL of some giant conspiracy against them. Hey — maybe they even have some people working for them who do know more about statistics than a little.

42 responses to “Flippant Accusations

  1. xkcd.com/882

  2. I’m a broken record, but analogies to sports stats is an excellent way to reach out an audience that’s being tricked on climate change.

    This post, maybe slightly modified to drive home the cherrypicking on climate change, would make a great Op-Ed at one of the linked media sources.

  3. Anybody else thinking of Act I “Rosencrantz and Guildenstern are Dead,” by Tom Stoppard? I mean, they got to 92 “wins” in a row before they started positing supernatural forces at work…

    • I wasn’t, but now I am… loved that play.

      • My Uncle is in theatre. There is another play based on the Bard’s account of the Prince of Denmark where the play opens with Hamlet being fired for generally acting like an ass, and they proceed to do the play without him.

        I don’t remember the name of the play.

  4. Got it. Kind of like estimating the likelihood of getting 14.39 inches in a single day at a single rain gage. That turns out to be a once-in-40,000-years event. Of course considering the total number of rain gages in the world, the likelihood of it happening somewhere is a helluva lot greater than one in 40,000.

    • Well, except that if you look at all the precipitation data, you find that large impulsive precipitation events are rising all over the planet, so you’re still full of fetid dingo kidneys.

    • And precipitation probabilities are calculated for specific locations based on long records for that location.
      The probability for 10 inches of rain in a day in the Atacama desert in Chile would be much, much, much more of an outlier than 10 inches of rain in Mawsynram, Meghalaya State in India.

  5. England Cricket captain Nassar Hussain lost 15 coin tosses in a row during his stint as captain (the team won at least one toss during that period, when Hussain was replaced as captain due to injury). And then lost a coin toss again while appearing on TV when this fact was mentioned. He always called tails. I make that odds of 1 in 32768 (obviously ignoring cherry picking the start of the sequence, as well as the TV show, which would make it 1 in 65536).

    This was preceded by a run of losing 8 out of 10 tosses, but I can’t figure out if these runs overlapped (I’m not going to spend long enough on cricinfo to work that out). Worst case is that he lost 23 out of 25 tosses. Of course, this wasn’t at the start of his captaincy, so all the cherry picking above applies.

    Of course, there are an awful lot of coin tosses used in national level sports competitions, and there have been a lot of captains.

    • OK, so I actually am bored enough to spend enough time on cricinfo to work out the sequence here. Hussain lost 21 out of a sequence of 23 tosses between Aug 2000 and Dec 2001, with odds of 1 in 30,000 (ignoring the cherry pick as is done above to get 1 in 137). In his whole career as captain he won 46% of the tosses (in 101 matches, giving an expected variance of 25). So there’s no evidence of bad luck or “anti-fixing” overall, despite the terrible run.

  6. Speaking of knowing a ‘lot’ about statistics, the latest bulletin from Blatherskite Heights has been posted at WUWT, entitled “The Pause lengthens again – just in time for Paris”, in which C. Monckton advises that “The hiatus period of 18 years 9 months is the farthest back one can go in the RSS satellite temperature record and still show a sub-zero trend. The start date is not cherry-picked: it is calculated…And yes, the start-date for the Pause has been inching forward, though just a little more slowly than the end-date, which is why the Pause continues on average to lengthen.”.

    [Response: Leave it to Monckton to apply the *definition* of cherry-picking and claim it’s not cherry-picking.]

    • They’re hanging on to the RSS satellite data (has the UAH data now succumbed to reality?), because they’ve got nothing left. Once that’s shown (to them) to be a) not a record of surface temperature or of total heat content and b) riddled with errors, they’ll have nothing and will dissipate into the ether.

    • Each time Monckton issues a new version of his “The Pause lengthens yet again”, he dresses his graph up with technical minutiae such as “Trend -0.00 C° (-0.01 C°/century)” and “r² = 0.000”, but the one piece of information he never supplies is the average temperature during the entire pause (the graph’s Y-average). The reason he doesn’t do this is because that average keeps increasing from version to version, as the start date of the pause inches forward. This is very easy to verify using the downloadable RSS data. For example, when the “pause” was 17yrs 10mos (from October 1996 to July 2014), its average temperature anomaly was 0.2341; when it was 18yrs 5mos (from December 1996 to April 2015), the Y-average was 0.2363; and when it was 18yrs 8mos (from January 1997 to August 2015, this average rose to 0.2394. In other words, the average temperature of Monckton’s pause, which is supposed to show no warming, is itself warming.

    • I have to admit, Monckton is a genius. To find a silver lining as the non-hiatus ends so spectacularly marks him out as extraordinary.

  7. I didn’t get the impression the sports writers were taking this too seriously, and thought they were having a bit of a laugh. Compared to most journalists (a low bar, admittedly) they are reasonably comfortable with basic statistics and likely realized that winning 19 (instead of 12 or 13) of 25 coin tosses was not exceptional in a 32-team league.

  8. B Buckner, if the level of CO2 in the atmosphere changes, then the odds of getting that amount of rain at a particular location in a particular season change. At 350 ppmv CO2 the odds of 14.39″ in one day may be 1 in 14,600,000 days (40,000 years), but at 410 ppmv CO2 the odds may be 1 in 365 days, and at some point with loss of sea ice there will be very different global circulation patterns and Austin will have very different weather.

    Given the current rate of change of CO2 concentrations in the atmosphere, many weather events that never before were observed at that location will become common, and many previously common kinds of weather events will never again be observed.

  9. Eli used to earn extra money refereeing. The referee provides the flipper and flips the coin so if a team carefully monitored the heads/tails ratio for a particular referee, yes, there could be an advantage

  10. If 32 fair coins are each fairly tossed 25 times then the probability that at least one of them comes up heads at least 19 times is about 21%, or almost 1 in 5.

  11. In high school, a friend and I, presumably bored, decided on a teenage whim to test the laws of probability via repeated coin flips. We had some good runs on both sides of the ledger, but in the end, we came pretty darn close to 50/50, just as one would expect.

    The lone anomaly was that we did log one coin on edge–which, as I recall it, was directly related to a distinct lack of even, level surfaces in my friend’s room.

    • David B. Benson

      Not just once, but twice and in the same hallway, dropped a quarter to have it land on edge and roll all the way down the length of the hallway.

  12. If I may enter a humourous comment, and in line with the observation of a certain cricket captain “He always called tails”. How many of these games were played in Australia? And did a smart local insist that an Australian $2 coin be used for the toss? (that coin has a head on both sides :-)

    Seriously, I have been a long time reader here Tamino, but this is my first comment. Love your work, not just for the excellent statistical explanations for those of us with some but limited knowledge in the area. About to finally also get around to giving you some financial encouragement.

    And as a fan of the use of relevant analogies, surely this comparison will give any Patriot supporting climate change “sceptics” pause for thought… (pun intended)

  13. What about the previous 25 coin tosses, and the previous 25, and so on? Also, I don’t think it matters what you call for each flip; no need to stick to the heads or tails to get sequences of wins or losses.

    • As long as the odds are .5, it doesn’t matter. If the odds are not .5, then always guess the option above .5. I mention this because psychology experiments long ago showed that untrained humans usually adopt a “probability matching” strategy which is actually less successful. That is, if the odds of a success were 60-40, untrained humans often spread their guesses 60-40 which leads to less than 60% successes. When I say “untrained” I mean most people including professionals in most fields.

      This is just another aspect of the human propensity to see patterns where none exist and as such is related to cherrypicking bias.

    • “theses” of course, not “these”

  14. I look after a large university unit, where we have a multiple choice test -15 questions, each with 5 choices. And one unlucky student managed to get all 15 wrong. If you guess, your chances are 3.5%, but one has to presume the student was trying to choose the right answer each time.

    • If this were more extreme (ore questions) you might need to interpret this as an indication that the student actually new the right answer so as to avoid it!!!

      Or, it can actually be an artifact of the test construction process itself. In a former life I dealt with psychometric issues a lot and found this a few times occurring as a result of bad “foils”–the technical term for wrong answers. One example in particular was a test that was constructed such that for each and every question there was one foil that was simplistic but in the general ballpark and then a right answer was more complex but precisely on point. Some students took the simple, somewhat correct foil every time. Now if all the foils would have been at the same level of complexity as the correct answer, this would not likely have happened except as allowed by chance.

      • Back when I was setting tests, I discovered that a surprising number of students tended to overlook the word ‘not’ when reading multiple choice questions–with predictably unpleasant results.

  15. This reminds me of a George Gamow piece I read as a kid about a scheme to break the bank at Les Vegas. The plan was to bet a dollar on red on the roulette table, and to double the bet each time you lost. Eventually black HAD to come up and you double your money, recouping all previous bets, plus one dollar. Just keep repeating and banking the one dollar wins, until you reach your target – say $50.

    Gamow argued the strategy gave no better chance of winning $50 than if you just put $50 on red to start with. .5 in fact. I can’t find the book on my shelves now to check his reasoning. Though I can see you’d need seriously deep pockets to make this work, given the joys of exponential growth of wagers.

    [Response: It’s an old strategy, and people love it because it truly makes money for them. The people I’m talking about, are Vega casino owners.]

    • A difference between Vegas and European roulette tables is that the former have 0 and 00, the later only 0. Since both lose if you are playing red or black, the strategy loses. Also, you can run out of money on a long streak.

    • Ah yes, the Martingale.

      I lived in Amsterdam for a while about 30 years ago, and the casinos there didn’t have enough seats at the table for all the people who wanted to play blackjack, so they used to allow observers to bet on the people who were playing :-)

      So what I used to do was pick out a player that looked like they knew what they were doing, and wait for them to lose a few hands in a row. Then I would start betting on them with the table minimum which was 10 guilders at the time. As you may know, the main problem with the Martingale is that you must double your bet each time you lose just to recoup your original bet, and if you hit the table limit, then you can never recover your original bet and you have lost. Big time. In this case, the limit was 500. So the progression was:

      10 20 40 80 160 320

      And if you lost the 6th bet, you were then screwed because it was not possible to bet 640. However… I played all night long like this on many occasions, and invariably walked away with about 200 guilders each time. I did get quite a few stares from the other players whenever I had to plonk down 160, but I never had to go to 320. Of course, I was probably just lucky. But IIRC correctly, I always waited for the player I was betting on to lose at least 4 hands in a row before I started betting. So that means he/she would have to lose 10 hands in a row before I was done for.

      The downside: it was pretty boring. Also, I’m sure if you tried this too many times at the same casino, they would inevitably kick you out.

  16. This omits the incredible record put up for the biggest coin toss of all- at the start of the Super Bowl. From 1998 to 2011, the NFC won the coin toss every single year, which should be a probability of 1 in 16,384.

  17. Tamino – why not go all the way and calculate the odds for all of the 32 teams flipping the coins? I’m puzzled why you finished with ‘A helluva lot greater than 1 out of 137.’

    I calculated that the odds are a little greater than 1 in 5 and far from ‘almost impossible’ as you say.

  18. Tamino – I don’t understand your calculation of 1 in 137 chance. The odds of getting exactly 19 heads (or tails) in 26 coin tosses is about 1 in 102. The odds of getting 19 or more heads (or tails) – which is what you and the journalist are actually referring to I assume – is approximately 1 in 69. Hence the odds of at least 1 team out of 32 teams tossing 26 times is greater than 1 in 2.68. Hence, every two and two third years on average you would expect at least one team to win 19 out of 26 coin tosses. Not a remarkable event at all.

    [Response: The stated condition is 19 out of 25 (not 26). I reproduced that calculation in order to demonstrate that others didn’t get the arithmetic wrong, they got the interpretation wrong.

    I didn’t bother with the full calculation for 32 teams doing the same because in the NFL, each flip that’s a win for one team is a loss for the other. Hence their sequences of flips are far from independent, which makes it a bit more complicated.]