Russian Roulette

(Note: this is chapter 1 of my book “Noise: Lies, Damned Lies, and Denial of Global Warming.” It’s not one of my usual technical-type posts, it’s an attempt to illustrate for the general reader how statistics can be misused to deny global warming. The data used are several years out of date, but the point is still quite valid, and at the end I’ve added a graph of more current temperature data. Feel free to refer your friends to this.)


Most of us have heard of the game called “Russian Roulette.” A revolver (usually with 6 chambers) is loaded with only one bullet; five chambers are harmless but one is lethal. The cylinder is spun so the location of the live but deadly cartridge is randomized, then the players take turns putting the gun to their head and pulling the trigger. Sometimes the cylinder is spun again before each turn,
sometimes not. Whoever is unlucky enough to happen on the “live” chamber gets a bullet to the head and almost certain death; the other players are the “winners.”


If the spin of the cylinder is truly random, then on the first turn each cylinder has an equal chance of being the one triggered. With six cylinders and only one bullet, that means there’s a 1-out-of-6 chance the unlucky first player pays the penalty on the first round. There’s also a 5-out-of-6 chance the first player survives the first round. To some people that might seem like pretty good odds. If the payoff for survival is high enough, they might even consider the game worth playing. Of course, the penalty for losing is about as big as it gets, and most people consider Russian Roulette a fool’s game.

In fact most rational people understand that in spite of the much-better-than-even odds of surviving the first round, sooner or later someone is going to pay. If you take multiple turns, over the long haul the odds are against you. If you keep playing long enough, the odds of failure are so high that the big penalty becomes, effectively, inevitable. Over the long haul, you’re sure to lose.

This is a case where almost everybody understands that in spite of the randomness in the game, there’s also an element of certainty; inevitable death. It’s not the randomness that’s important, that’s just “noise” in the system. What matters is the long-term behavior, and that’s easy to understand: somebody dies. In six turns, there’s likely to be one shot fired; if the cylinder isn’t re-spun at each turn then there’s sure to be one shot fired.

Even if the cylinder is randomized each turn, we still expect one out of six turns to release a bullet. It might not; after all, there’s still randomness in the system. We might even escape fatality in 12 turns, or 18, or 120, or even thousands. It’s the nature of randomness that it can’t be predicted — these things can happen entirely by random chance. But the odds of a hundred turns without the gun firing are only about one in 83 million. The odds of a thousand turns with no bullet discharging are only one out of a million trillion trillion trillion trillion trillion trillion; so small that even though it could happen by random chance, we shouldn’t expect to see such an event even if we keep trying for the entire very long lifetime of the universe itself.

If you fired a gun with only one bullet in six chambers, spinning the cylinder each time (but not pointing it to your head), and did so ten thousand times, you could collect some actual data about the odds of fatality. It’s not too hard to compute the odds even without doing the experiment — one out of six is a pretty straightforward computation. But if you did perform the experiment, you’d expect the gun to fire about 1,667 times out of 10,000 trials. It probably won’t be exactly 1,667 times; you might note only 1,600 firings, or maybe as many as 1,700. But the chance of no firings at all is vanishingly small, far smaller even than the odds already quoted for a thousand trials.

Of course, in ten thousand trials you’re bound to run into some stretches of many turns in a row with no firing, just by random accident. You could easily note, say, 25 turns in a row when the gun doesn’t fire. It’s the nature of randomness that such things not only can happen, they will happen. If the odds of not firing on a single trial are 5/6, then the odds of not firing 25 times in a row are (5/6)25 = 0.0105, or about 1 out of 95. There’s a 1-out-of-95 chance of the gun not firing on the first 25 trials; not very likely, but a far cry from one in a
million trillion trillion etc. etc.

In ten thousand trials you’re likely to encounter at least one run of 25 with no discharge; in fact you’re likely to see a run that long, or longer, more than once. What most people don’t know is that with enough tries, not only is it possible for random chance to make surprising events like this happen, it’s next to impossible for such events not to happen. The fact that unlikely events are not just possible, but inevitable given enough trials, means that a proper statistical treatment of data can be very tricky business.

Now suppose someone is watching you do the experiment. At some point you encounter a run of 25 with no bullet being fired. That’s when the observer declares that the “1-out-of-6 chance of the gun firing” theory is wrong and Russian Roulette is safe!

He pronounces that you can go ahead and play Russian Roulette without worrying because all that talk about how it’s sure to be deadly in the long run, is just a hoax. Not only is the “1-out-of-6” theory wrong, he even informs you that it’s just a scam perpetrated by statisticians so they can get more government grant money for their research. Any laws against Russian Roulette should be repealed, and the whole no-Russian-Roullete movement is a conspiracy of anti-gun liberals who want to use it as an excuse to raise taxes on firearms and institute world government based on socialism. Sounds pretty crazy, right?

You probably wouldn’t take this person’s advice, wouldn’t support his proposals, wouldn’t even want to invite him to your house for dinner. But if you live in the state of Oklahoma, you just might have elected such a person to the United States Senate.

Signal and Noise

Most physical processes involve some randomness. They also usually involve some aspect which is not random. Russian Roulette is an example; there’s certainly a random element (the spinning of the cylinder makes the chamber selection random) but there’s a non-random element as well (there’s exactly one bullet in exactly six chambers). Although the result of a single trigger-pull is random, the probability on a single trigger-pull is not.

We can think of the probability — the non-random part — as the signal. Any individual result — a single pull on the trigger — is a combination of signal and noise. This is the statistical use of the word “noise,” it doesn’t refer to loud or unpleasant sounds but to the random component of a set of data. In almost all observed or measured data, noise exists.

Sometimes there’s noise in data just because measurements aren’t perfectly precise. For example, if we measure the speed of light in a vacuum we’re not likely to get the precisely correct answer simply because our measurement apparatus is imperfect. But the speed of light doesn’t fluctuate; it’s truly constant. In this case, we have an element of randomness in the data because of measurement error.

But it’s much more common for the process itself to show some randomness, like the Russian Roulette experiment. Just for fun, I used a random-number generator to simulate the experiment with 10,000 trials. Out of the 10,000 simulated trials there were 1,624 firings. This isn’t exactly equal to the “expected” number 1,667, because of the randomness inherent in the process. But it’s well within the range we might expect from probability theory. With 10,000 trials, we expect that most of the time the observed number of firings will be between 1,592 and 1,741. By “most of the time” I mean 95% of the time, which is the de facto standard for “most of the time” in scientific research. Such a range of likely results is referred to as a 95% confidence interval. For this particular experiment, the result was well within the 95% confidence interval.

What should we expect to get when we pull the trigger? In the colloquial sense we expect the gun won’t fire (there’s only a 1-out-of-6 chance it will). But in the statistical sense, the phrase “expected value” refers to the average expectation over a large number of observations. This is a standard computation in probability theory; the expected value is the sum, over all possible results, of the probability of that result times the value of that result. Another name for the expected value of a random variable is the mean.

What’s the mean for a round of Russian Roulette? The probability the gun doesn’t fire is 5/6, and if it doesn’t fire we note zero firings, so the value of that result is zero. The probability that the gun does fire is 1/6, and if it does we observe 1 firing so the value of that result is one. Hence for a single trial we get E(x) = expected value = (5/6) x 0 + (1/6) x 1 = 1/6 = 0.166666…

The expected value is one of the most useful concepts in statistics. But that doesn’t mean that, in the common sense of the words, we can expect to get the expected value! If we pull the trigger, either the gun fires or it doesn’t, so the observed value on a single trial is either zero or one. It’s just not possible to get one sixth of a bullet fired from the gun. Although the expected value truly describes what we expect on average over the long haul, for a single trial (and often, for a small number of trials) getting the “expected value” is actually impossible.

Global Warming

It’s important to realize what global warming is and what it is not. Global warming is about the trend (the signal), not about the fluctuations (the noise). The lack of clarity about this is well illustrated by a comment submitted in early February 2009 to the (excellent) climate science blog RealClimate (http://www.realclimate.org/):


Michigan just experienced its 5th coldest recorded January (NOAA). This makes it difficult for the layman (even myself, despite all that I have read and studied) to maintain a level of concern about irreversible warming from CO2. When is a cold spike in temperatures notable? It seems like the last two years demonstrate some notable variance in the warming trend rather than just a weather change. (not a denier just an observer)

This is very much like saying, “We just observed 25 consecutive trigger pulls with no bullet discharged. This makes it difficult to maintain a level of concern about the danger of Russian Roulette.” The commenter (quite innocently) is taking note of the noise in the data and it causes him to doubt the reality of the signal.

From the point of view of statistical behavior, experiencing the 5th coldest January ever recorded in Michigan is nowhere near an indication of “notable variance in the warming trend rather than just a weather change.” Ironically, as Michigan brought its 5th-coldest recorded January to a close, South Australia was recovering from their worst heat wave ever recorded (http://www.guardian.co.uk/world/2009/feb/02/australia-heatwave-deaths) “enduring six consecutive days of temperatures reaching 113F (45C).”

Global warming isn’t about the day-to-day or month-to-month or even year-to-year fluctuations, and it’s certainly not about warming in all locations at all times. Just because climate changes, that doesn’t mean we won’t still have weather! There will still be cold days and hot days, but the hot days are likely to be a bit hotter and the cold ones not quite so cold. In fact there will still be cold months and hot months, because we’ll still have fluctuations — noise — in temperature data, even when the trend — the signal — is steadily increasing. And we’ll still see variations not only from month to month and year to year, but from place to place as well; while Michigan shivers through its 5th-coldest January on record, South Australia swelters through its worst heat wave. It’s the long-term trends globally, not the short-term events locally, that global warming is really about.

Cherry-Picking

The commenter on the RealClimate blog was an average person, struck by the recent cold January, asking an honest question based on a natural (but incorrect) interpretation of his direct experience. Those who deny the reality of man-made global warming exploit such natural events in order to spread doubt about the reality of global warming.

In the simulated Russian Roulette experiment, the longest consecutive span with no firings at all was a run of 38. If I had done only 38 trials, not 10,000, then the probability of no firings is a paltry 1 out of 1,021. Imagine the observer now, proclaiming that if the 1-out-of-6 theory is correct, the probability of that result is less than 1 out of a thousand! He insists that it proves the 1-out-of-6 theory is false, and if we enact anti-Russian-Roulette legislation it will ruin our economy while providing negligible protection against Russian Roulette related deaths. Of course this is a fallacy. The chance of 38 no-firing results in a row is less than 1 out of 1,000 if we try only 38 times. But we tried 10,000 times, and that gives us a lot more chances to observe 38 in a row.

Unfortunately, from a propaganda perspective, the mythical observer can now provide exact mathematical calculations showing that the chance of 38 in a row is very small indeed. He can even point to published scientific research confirming how small that chance is. He doesn’t bother to mention that it doesn’t apply to the data being studied because these data don’t conform to the conditions necessary for that calculation. Maybe he’s simply not aware of that fact, and in his zeal to support Russian Roulette rights he’s not interested in hearing about it. Maybe he is aware, but makes that argument anyway.

You try to point out that there were a lot more than 38 trials in the experiment. You show that the full result, 1,624 out of 10,000, is well within the 95% confidence interval according to the 1-out-of-6 theory. You even show that there’s an overwhelming consensus among statisticians worldwide that the 1-out-of-6 theory is true, and that the run of 38 null results in a row is not at all unusual in an experiment with 10,000 trials. Finally, you protest that the observer has taken only the data he likes while deliberately ingoring all the data contradicting his less-than-1-out-of-6 theory. His argument is just a trick.

This trick not only enables the mythical observer to make his argument, it’s likely to be very persuasive to a lot of people (especially those who are afraid of liberal anti-gun nuts and world government based on socialism). It’s called cherry-picking: the practice of using only the data which is favorable to one’s preferred result while ignoring evidence which contradicts one’s preference. It’s one of the most common methods of deception in the global warming “debate,” one which is often used by those who argue that global warming isn’t happening, or that it’s happening but not due to human activity, or that it’s happening because of human activity but it isn’t that bad, or that it’s happening due to human activity and is bad, but … there’s nothing we can do about it.

Global Temperature

Let’s take a look at some actual temperature data, from NASA GISS (the Goddard Institute for Space Studies). The data are temperature anomaly; temperature anomaly is the difference between the temperature at a given time, and what it used to be for the same time of year during a baseline period. Hence temperature anomaly is a measure of how much the temperature is above or below average, with average defined by the baseline period. In this case the baseline is the average temperature from 1951 through 1980. This will tell us perfectly well whether or not global average temperature has changed over time; figure 1.1 shows the annual average temperature anomaly from 1880 through 2008 (http://data.giss.nasa.gov/gistemp/tabledata/GLB.TS+dSST.txt):

fig1_1

Annual average temperature anomaly from NASA GISS.

Two things are obvious. First, over the long haul temperature has changed. It’s gotten hotter globally, mainly in two episodes of warming, one in the early 20th century, the other since about 1975. Second, in addition to the persistent changes that have occurred (the signal) in global average temperature, there’s also a lot of up-and-down jitter from year to year (the noise). Part of the noise is measurement error, but most of it is actual random behavior of the climate system. So global temperature, just like most observed phenomena, is a combination of signal and noise.

In spite of the noise, the signal is big enough that the long-term trend is plainly visible. We can reduce the noise level, while preserving most of the signal, with a number of mathematical techniques. Probably the simplest is to average the data over longer time spans. These data are already averages, over 1 year each. If we compute the average temperature over each 5-year time span and plot that along with the 1-year averages, as in figure 1.2, the trend becomes even more evident while the noise is greatly reduced. We can even plot just the 5-year averages in figure 1.3, which shows the trend plainly with much less noise.

fig1_2

Annual average temperature anomaly (thin line)
and 5-year averages (thick line with dots).

fig1_3

5-year average temperature anomaly.

This illustrates a general principle: when we average over longer and longer time spans, there’s usually much more noise reduction than signal loss. But in fact even with 5-year averages of global temperature, it’s still possible for the value to decrease from one 5-year span to the next, not because the trend has reversed but only due to noise. Therefore 5-year averages, while much less susceptible to false impressions due to noise than 1-year averages, are still not immune to that phenomenon.

Climate and Weather

Climate is defined as the average and range of variation of weather over long periods of time. The main reason to insist on long periods of time is that for short periods of time, weather is random. In a sense, weather is not truly random because it’s determined by exact laws of physics. But weather also defies prediction more than a week or so into the future because the physics determining weather exhibits the property of chaos. Discussion of chaos theory is well beyond the scope of this book, but suffice it to say that it makes the details of weather unpredictable, although we can predict the average and probable variation of it. This leads to a statistical characterization of weather, which is what the climate is. Over the long run, the actual day-to-day weather measurements behave just like random variables, so it’s perfectly correct to treat them as though they were random.

The “standard” length of time to meet the requirements of estimating climate rather than just weather, is 30 years. Experience has shown that for quantities relevant to weather (temperature, rainfall and snowfall, wind and storms and air pressure) this is a long enough time span for the true “normal” behavior to emerge from the background noise. This gives us our first clue about how long a time is necessary to detect a change in the climate rather than just a change in the weather: about 30 years.

That doesn’t mean that all measurements have to span 30 years or more in order for us to establish climate change. If a climate-related variable is changing, the amount of time required to show it depends on how rapidly it’s changing (the size of the signal) and how big the noise is. A larger signal is easier to detect and requires less time; more noise makes the signal harder to detect, requiring more time. A good way to characterize how hard it is to establish that variables are changing over time is the signal-to-noise ratio; the bigger this ratio (the bigger the signal relative to the noise), the less time is required, the smaller this ratio, the more time is required. But as a very rough rule of thumb, the time span required to define climate is about 30 years.


(End note: here’s temperature data from both NASA and NOAA, updated to the present, with the final point showing the year-so-far average for 2015:)

globe

If you want the book, for yourself or a friend, you can get it here.

22 responses to “Russian Roulette

  1. Analogy can be slippery. But this one strikes me as spot on. Thanks for posting.

  2. There is the other aspect to the roulette game which I’m unsure how it would effect outcomes-
    so say the experiment was carried out- we have unlimited test subjects, a gun and bullets- but we are looking to be lucky.
    there is a chance [assuming the rules include a spin every time the gun is passed to the next player] that we could have a long term survivor.

    what will mess with the predicted outcome is the fact that every time a player is shot the game will reset and the following shots will not take place that are assumed in statistics, and a new player is brought in .

    so instead of a 1 in 6 chance- it maybe the first trigger pull is the shot, a kind of 50/50- denial is looking to be lucky, it is gamblers logic. So 18 years of ‘no warming’, ie a lucky streak could also happen in Russian Roulette- 18 pulls and no brains on the floor.

  3. AGW is like playing Russian roulette with a loaded automatic weapon. (e.g., photons from the sun.) Pull the (GHG) trigger and it starts firing, but it is hard to see where the bullets go, and after a bit, the players are stone deaf, and do not notice the noise of the weapon firing. After everyone that started the game is dead, the weapon will keep firing for a long, long time.

    As you have pointed out, sampling and data collection are more work than statistics. We have not made the effort to properly sample and collect data on AGW. AGW is total accumulated heat. The large sinks for accumulated heat are the oceans and the ice sheets. Our data on the changes in the heat content of the ice sheets and the oceans is full of gaps. Absence of data is not evidence of absence. Our air temperature databases are little help in estimating the change in Earth’s total accumulated heat.

    It is not that the gun did not go off, it is that we did notice that the gun was firing, and not bother to track where the bullets went and what, or who was being disrupted.

  4. Hidden in your example (well, maybe not that hidden) is the deterministic physical model of a live cartridge in a revolver chamber.

    The denialati might argue there is no bullet, or if there *is* a bullet there has *always* been a bullet, and there’s no certainty that a bullet to the head is always bad.

    • Michael Hauber

      – Bullets are plant food.
      – The medieval arrow was bigger than a modern bullet.
      – We don’t argue there is no bullet, only that it is smaller than the ballistic experts claim
      – Bullets are good for the economy

  5. Hey, I bought that book a few years ago. I enjoyed it very much. I’ve just recently been mulling whether to buy Tamino’s stats book too or an alternative that addresses R and stats together. Any advice?

    • Note that my aim is to become sufficiently proficient in R and basic stats to be able to understand and challenge claims made on skeptic sites. Would Tamino’s book help me with the R side of things?

      [Response: I’m afraid not.

      A lot depends on how much stats you already know. Introductory texts don’t usually use R (but if someone knows of a good one, pipe up). I would say that R is very easy and natural, if you just start using it, use it a lot, use the help system, you may pick it up quickly. And there are books about R itself, which may be what you’re looking for.

      As for introductory stats, I think my book is quite good. It’s not perfect, but it emphasizes understanding rather than following a “cookbook recipe” and I think it does a pretty good job.]

    • In the introductory course for R in my university (of Helsinki) the recommended supplementary reading (in addition to our own material) is Peter Dalgaard’s Introductory Statistics with R (2nd edition, 2008). If you google it you should find a pdf of the book.

      For graphics and visualisation I recommend the R Graphics Cookbook by Winston Chang, which mostly uses the excellent ggplot2 package. It’s pretty good to start with and is easy to understand with ready “recipes”.

      For a very soft introduction to R you might want to try, well, Try R (tryr.codeschool.com).

      There’s lot of material available on the internet, so you’ll get many different recommendations. I’m not aware of good books on statistical reasoning (it seems to me books tend to be either cookbooks for “end users” or very mathematical in nature), but I’m sure others have something to recommend. I haven’t read tamino’s book, but I’d like to at some point.

      Also I recommend using RStudio as your programming environment instead of the default RGui which comes with R.

      Regards,
      an undergraduate student in stats

    • To my mind, R is best learned if you know what you want to do first. That is, if you have a clear idea of what you want to accomplish, there is a way to do it in R. It forces you away from the cookbook approach. Trouble is you have to have some good fundamental knowledge first

      That said, if you do not know the relevant functions but have a clear idea of what you need to accomplish, the online resources for R are immense. For instance, I had never had cause to look for run lengths till this post. Searching “R run lengths”, you will find R has a run length encoding function named, oddly enough, rle(). (Note, this is the OPPOSITE of cookbooking–you are _looking_ to do the right analysis not looking how to fit your data into some canned programme like SPSS.)

      Tamino’s analysis, then, is coded in R in a few lines. I could have compressed this much further but left it expanded for clarity…

      set.seed(104) # set random seed for replicability
      y <-rbinom(10000,1,1/6) # generate random binomial "success" = 1/6
      y_rle <- rle(y) # use run length encoding function
      y_table <- y_rle$lengths[y_rle$values!=1] # count runs of "fails"
      table(y_table) # output table of run lengths (row1) and counts (row2)

      Since I set the seed, you should get:

      y_table
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 29 30 34 41
      183 219 151 111 118 90 90 66 54 53 39 27 31 16 16 22 8 10 9 5 5 6 7 4 4 2 5 1 2 2 1

      Note the run of 41 fails! But also note the 183 runs of minimally closely-placed fails (i.e., the sequence "fire"-"click"-"fire")!

  6. Well, given that the only reason to play Russian Roulette is a deep-seated death wish, is anyone up for a Freudian interpretation of climate denial?

    As a retired academic in the humanities, I really appreciate the clarity of your explanations.

  7. I already read, and I hope (understood), the book, which was a great introduction to global temperature measurement and media/ denier manipulation of the evidence.

    Tamino gives a fair description two posts up in this reply to Raff. I have some background in stats, but still found “Lies, Damn Lies etc” a stimulating read.

  8. Off-topic: Tamino, have you considered commenting or posing a question at Hansen et al. (2015)? Easy enough, and unlike some who already have, having published in climatology, you seem eminently qualified to do so. I for one would be interested. (Feel free to delete. Really.)

    [Response: Actually I don’t feel qualified to comment. My expertise is statistics, not climate science specifically, and all my climate-related publications deal with that specialty.]

  9. Implicit in your examples of the expectability of runs is something that should be made _explicit_: Observing a run of 38 empty chambers does NOT prove that “the Russian Roulette models do not work”.

  10. Everett F Sargent

    (0) OK, I know this is meant for a certain level.

    (1) But I do wonder about the N =1 vs the more general case of N=0,1,…,5,6 (there are seven choices, nothing special in that, but for odd numbers, the midpoint, N=3 has statistical symmetry, nothing special in that either, but for whatever reason, I always like N to be an odd number, in a weighting sense, mostly in numerical modelling).

    There is symmetry, so N=0 (p) is the same as N=6 (1-p), or some such..

    Anyways for N=0 (in all trials, but for which the data collector has no idea what N actually is), in that limiting case of the discrete system you are using (versus a continuum, or large N), we can make a statistical determination sooner than N=1 or 5?

    To complicate matters further, assume for each sample (each trigger pull), N=0,1,…,,5,6 is an asymmetric distribution we sample from …

    (2) The 30-year trendline and p=0.05 are (somewhat) subjective to what we think we know from pure statistics alone.

    The CO2 monthly atmospheric time series (1958-2015+) has high S/N ratios, calculate the standard anomaly time series, and stairsteps emerge, they (i. e. CO2Trends) use a different method to calculate or remove annual variations as the annual anomaly curve changes with time (with the underling trend also changing).

    Basically actual known S/N ratios do matter.

    (3) Most numerical models are deterministic, certainly the AOGCM’s are deterministic models (BC/IC driven morphs into a BC problem due to chaos). So, for example, there will be spatial locations, that will fail all known statistical tests (the S/N is so great that the weak trends, near zero, will not pass those statistical tests, yet there is deterministic value obtained, there is an expectation that some areas/places will not pass pure statistical tests). We actually do see that at various locations in the real world. We also see statistically significant negative trends in the real world and in the deterministic AOGCM’s. This does segue into the issues involved with cherry picking.

    (4) Long story short? This stuff get’s rather complicated rather quickly (you need to account for both deterministic (an inside tip in the stock market, the system is gamed to a certain degree) and stochastic (econometricians are quite obviously not trillionaires, if one was a trillionaire OMFG) processes, it’s not one or the other).

    DISCLAIMER: My writing style is not the best by far, so if something is lost in translation, go figure. I could also be very wrong, again go figure.

  11. Everett F Sargent

    As usual, I did something wrong (in (3)).

    “the S/N is so great” should be “the S/N is so small”

    Damn it, I’m confusing myself now. Anyways, “the S/N is so (something)”

    Signed,
    Lost In Translation

  12. s/Roullete/Roulette/g

  13. Andy Lee Robinson

    Deniers play Russian Roulette with one empty chamber, because getting shot in the head just stings a bit and has no chance of hitting anything useful.