Somethin’ ain’t right

I retrieved data myself, temperature at the 850 hPa level (about 1.5 km altitude) from 20th-century reanalysis, from ECMWF. I’ve looked at two locations on the equator, at longitude 45E and 60E, selected because they’re two places for which Sardeshmukh shows an increase in mean temperature but not in the probability of extreme temperature, defining “extreme” as more than 2 standard deviations above the mean. Here’s his graph of how the mean temperature has changed (in units of standard deviations):


When I analyzed the data, I got a different result than he got. A very different result. Somethin’ ain’t right, and I really want to know what’s goin’ on.

Most of us have seen graphs like this (e.g. the top panel here):


The thick black line shows the pdf (probability density function) for the normal distribution when the mean is zero and the standard deviation is one, while the thick red line shows the same thing when the mean value is increased by 1, a full standard deviation. The colored regions show the area under the pdf for values higher than 2, which is the probability of getting values that big or larger — we might even call them “extreme” values. For the black curve the chance of exceeding 2 (the extreme-high probability) is only 0.02275, but after the mean value gets bigger (all other things being equal) that probability increases to 0.1587, just about 7 times higher.

Let’s look at this in a different way. Instead of plotting the pdf (probability density function) which has the characteristic “bell curve” shape, let’s plot the cdf (cumulative distribution function) instead. Actually let’s use a different form of that, often called the survival function, which is just 1 minus the cdf, because the survival function is nothing more nor less than the probability of exceeding a given value.


Note that at x-value 2 the survival function is just 0.02275, but after increasing the mean by 1 it increases to 0.1587.

So that’s what happens when we increase the mean, all other things being equal: the chance of exceeding some “extreme” cutoff threshold also increases. In this case, dramatically (by a factor of 7) because the mean increased by an entire standard deviation.

What if we increase the standard deviation (say, by 50%), but leave the mean alone? That looks like this in terms of the pdf (like the second panel here):


This time we haven’t shifted the distribution left or right, but we have widened it. Doing so also increases the probability of values above 2, in this case from 0.02275 to 0.09121, a four-fold increase. If we look at the same thing in terms of the survival function, we get this:


Having introduced this, let’s look at daily temperature data from longitude 45E along the equator, at the 850 hPa level, from reanalysis data:


Note that temperature is in Kelvin, which is why the numbers are so high. Visually it doesn’t seem to have changed much, but if we transform from raw temperature to temperature anomaly (in order to remove the seasonal cycle) we get this:


According to Sardeshmukh’s presentation (which Judith Curry was kind enough to provide), here’s my reading of what he did. First he isolated the anomaly data for Jan-Feb-Mar (winter in the north, summer in the south). Next he isolated two 25-year long time spans: 1901-1925 and 1981-2005. Then he estimated, for each, the probability distribution in order to estimate the probability of extreme heat, defined as more than 2 standard deviations above the mean. He doesn’t say which time span defines this cutoff limit, but he did say explicity that the same absolute temperature cutoff was used for both sections. Finally, he computes how that probability has changed from the first time span to the second, and the graph he shows indicates that the probability didn’t change by much, not (according to my reading of the color scale) by more than 0.001.

I did the same thing. And here’s what I got:


The dashed line shows the cutoff limit of 2 standard deviations when we define the mean and standard deviation using the initial time span. The fact that it’s so close to the numerical value 2 is just a coincidence.

To get a better idea of how things changed, let’s zoom in on the upper range:


Note that from the first to the second time span the probability of exceeding that cutoff limit increased, and not by a tiny amount. It increased by about 0.014 which, from what I can see, is at least 14 times more than Sardeshmukh reports. Somethin’ ain’t right.

And there’s something else I find quite interesting. In his presentation Sardeshmukh includes what he refers to as a “math slide” in which he defines a “probability shift index” thus:


He mentions it as applying to the Gaussian (normal) distribution, but in fact it can be defined for any distribution, Gaussian or not. It can also be modified to indicate, not just the sign of the change, but by how much the probability of exceeding some limit will change. And here it is, in different notation (my own) which I’ll define for you:

\Delta S = f(z) [ \Delta \mu + z ~ \Delta \sigma ] / \sigma.

In this equation, \Delta S is the change in the survival function when the mean and standard deviation change but the shape of the probability distribution doesn’t change, i.e. the change in probability of exceeding the limit. f(z) is the pdf for a normalized version of the temperature limit, where the normalized version is simply

z = (x - \mu) / \sigma.

\Delta \mu is the change in mean, \Delta \sigma is the change in standard deviation.

Plugging in the numbers, this suggests that for the observed change in mean and standard deviation the “exceedance probability” will increase by 0.013. The observed increase was 0.014. That contradicts Sardeshmukh’s reported value of less than 0.001, and contradicts his assertion (at least for this location) that the change in exceedance probability “looks nothing like the mean warming pattern.” One cannot claim, on the basis of this data, that the change in exceedance probability is much affected by change in the shape of the distribution.

If instead of defining our cutoff by the mean and standard deviation of the first interval, we had done so using the second interval, then the cutoff would be different. It would look like this:


The exceedance probability still increases, but now only by 0.007. That’s still at least 7 times more than Sardeshmukh shows.

Of course that’s only one location. What about the other one I’ve looked at, for longitude 60E? This:


Again I’ve set the cutoff limit (the dashed line) at the 2-sigma level for the first time span. Now the exceedance probability has increased by about 0.06, whereas Sardeshmukh’s graph of changes in exceedance probability shows it to be negative.

Somethin’ ain’t right. Maybe I’ve made a mistake. But maybe Sardeshmukh has, or maybe there’s more to what he did than meets the eye. I am genuinely curious to know, what’s going on?

I’ve also looked at 20th-century reanalysis data for these locations, not at the 850 hPa level, but at the surface (often called “T2m” for temperature 2 meters above the surface). They give the same results. I do have to wonder, if the subject is the increase in extreme temperatures due to global warming, why would you study the temperature at 850 hPa? It doesn’t make sense to me.

I’ve also looked at actual thermometer data for daily temperature, from a number of locations (mostly courtesy of ECA&D, the European Climate Assessment & Dataset network). Results: the same.

I was fully prepared to study these data, confirm Sardeshmukh’s results, then post an admission of error. Prominently, unambiguously. I still am — but thus far my attempts to confirm Sardeshmukh’s results have only contradicted them. Somethin’ ain’t right.

58 responses to “Somethin’ ain’t right

  1. Everett F Sargent

    It could be me and it probably is but …

    Publishing in a chaos theory journal might not give due attention to boots-on-the-ground observational data (as opposed to model data (reanalysis, partial AOGCM or otherwise)).

    A theory must be testable, most often that is done with actual empirical observational data. Saying that the modeled higher moments more than offset (e. g. higher positive skewness or higher kurtosis) the lower moments is suspect if no effort is taken to compare with the actual boots-on-the-ground observational records.

  2. If what you are interested in is heat waves and their effect on people, well heat waves only have effects on people on land so doing this on a global basis is in principle wrong and given the heat capacity of the oceans that will reduce the variance, it is likely misleading.

  3. Everett F Sargent

    What Eli said, land surface matters most to us humans.

    Also, this does appear to have a rather long history, back to at least 2005:

    Multiplicative Noise and Non-Gaussianity: A Paradigm for Atmospheric Regimes?

    Perhaps a bit too theoretical for me, but I tend to like real data anyways.

  4. Is the X-axis of your ‘survival’ function graphs meant to be in ºK? I would have expected s.d. from mean.
    Saying that, the graph of (Δmean)/σ (as in post above) and (Δσ)/σ both show positive values for the region of your examination, the (Δmean)/σ with a value in excess of 1.2. Yet the increase in Δmean in the first ‘survival’ function graph (the only one you show in full) is surely less than +0.5 (of whatever) and if I had to say, Δσ is surely negative. Are these discrepancy more stuff that “ain’t right”.

    [Response: My x-axis scale is in degrees K.]

  5. Horatio Algeranon

    I think what we have here is a case of the Indian Ocean Mile High Triangle where different physical laws apply.

    i have heard it has very bizarre effects on any probability distribution which is unlucky enough to pass through the region at that altitude.

    in fact, some probability distributions have disappeared without a trace.

  6. Sardeshmukh is not your student, and you bear no responsibility for his failure(s), or even those of JC in citing his work without checking his math.

    Your account above is for the assumption of normal distributions which is a reasonable assumption for thermodynamic systems in equilibrium. However, the Earth’s weather is being forced, and the thermodynamic system is not in equilibrium.

    The second period is warming much faster (more forced) than the first period and has a different distribution – it is more fat tailed to the right. Therefore, your approximation based on the assumption of normal distributions understates reality.

    Until we get the system back into control, the the probability of exceeding a given temperature value (above the current mean at that location) in some later period is one. The current, out of control system, is not kind to mammals.

    IPCC curves avoid this by not considering carbon feedback and by assuming that human emissions will diminish. Yes, at some point, human emissions will diminish. However, for me to ignore carbon feedback, I need a physical mechanism that will reverse the trend in Arctic carbon release, AND a physical mechanism that will prevent decomposition of sea floor clathrates by a warming ocean system. The IPCC has been reticent on these points.

    [Response: My calculation of the change in probability based on the equation incorporating change in mean and standard deviation, does use the normal distribution. But nothing else in this post assumes, or depends on, the normal distribution. In particular, the observed change in exceedance probability makes no normality assumption.]

  7. Are you sure this is the same data set as Sardeshmukh used?

    [Response: Somehow I doubt it. But it is 20th-century reanalysis data for temperature at the 850 hPa level.]

    Also, if it is the same, how many data points are we talking about? The script doesn’t seem to be too hard to right. Itlooks like it is mostly a matter of getting all the temp data.

    [Response: The entire time span includes over 40,000 data points (since it’s daily data). Restricting to Dec-Jan-Feb, and limiting to 25-year time spans, reduces the number to 2,257 for the first time span and 2,256 for the second.]

    And even if it is not the same source, if you (or anyone else) could produce what the change in >+2SD map looks like with this data, and compare it with Sardeshmukh’s maps it would, I expect, be most enlightening.

    [Response: I agree. I’m considering it, but it’s a lot of work.]

    • Thanks for the response. I should have been more clear but I was thinkng how many surface points are there in the ECMWF reanslysis?

      Side note: I used to work with this data then is was 2.5 deg resolution and only went back to 1960s (my PhD was heavily dependant on it – great data set even then!). Nice to see that is goes back so much further now. I assume the resolution has also gone way up.

      [Response: The numbers I gave were the number of time points, which is what I thought you were asking. The ECMWF reanalysis is now available on a 0.125 by 0.125 degree grid.]

      • For those who don’t know, we are only taking about 10 years since it was only back to the 1960s at a 2.5 degree resolution. It’s not that I am super old.

  8. Not directly relevant, but my drought paper has been accepted. “Accuracy Check on Predictions of Near-Term Collapse” will appear in the British Journal of Science.

    This will be my third article in a peer-reviewed science journal, and the first dealing directly with climate science. Tamino-sama, thank you for checking the statistical work in the first draft.

    [Reponse: Congratulations. Do let us know when it’s published.]

    • Barton, the British Journal of Science, assuming I found the correct one online ( is one big money scam, I am sorry to say.

      Just try and find the Editor-in-Chief at the University of California. I could not find any “Howard Longford” working at any of the UCs. Next name on the Editorial Board is one “Muller Kent” at the University of Groningen, and once again that person does not exist in the database of, in this case, the RUG (its Dutch abbreviation). It does not get much better with any of the names after that, although I could find some of them (e.g. Kathy Agard, who has retired some 5 years ago).
      The supposed managing director, Garaham Mark Hibbons, apparently has had a mom and dad who had trouble spelling.

      I checked the first issue, and noted one paper where the (Dutch) author apparently did not want to give anyone his first name. I could not find any other papers of a “Van der Stoop” from the University of Amsterdam, Department of Psychology, anywhere, so I became very suspicious. So I copied a few sentences into google, and I found a working paper from someone at the University of Ljubljana, which was the same as this supposed paper from a Dutch author. I have no doubt the original author is indeed Hugo Zagoršek rather than this “Van der Stoop”, since the at times idiosyncratic English is typically Eastern European, not Dutch. In other words, a plagiarized article with likely a fake author name. Note that all issues after that solely contain papers from, let’s be kind, traditionally not the strongest scientific environments.

      I also noted they claim to be indexed in Scopus, which they are not. I could not check this here at home, but I also doubt they are indexed by EMBASE and Geobase. Note these are all Elsevier bibliographic databases, in which the journal also claims to be indexed…

      Note that Jeffrey Beall has discussed this journal in a 2013 article, coming to the same conclusion as I did:

      Barton, I think you need some help if you want to publish in at least modestly trustworthy journals.

      • Sam Taylor


        Take it from someone who lives with one, Slovenians do not take kindly to being called “Eastern European”! I get in lots of trouble if I ever do that. “Central” is the preferred geographical region.

      • Barton, considering my last remark an offer.

        Sam, fair enough. I’ll take better care next time.

  9. Everett F Sargent

    I do think a lot of this has to do with the reanalysis products, in which the error bars prior to say ~1950 appear to overwhelm any real meaningful analyses of statistical moments (other than perhaps the mean):

    Independent confirmation of global land warming without the use of station temperatures (Sardeshmukh 2nd author)

    Their abstract:

    “Confidence in estimates of anthropogenic climate change is limited by known issues with air temperature observations from land stations. Station siting, instrument changes, changing observing practices, urban effects, land cover, land use variations, and statistical processing have all been hypothesized as affecting the trends presented by the Intergovernmental Panel on Climate Change and others. Any artifacts in the observed decadal and centennial variations associated with these issues could have important consequences for scientific understanding and climate policy. We use a completely different approach to investigate global land warming over the 20th century. We have ignored all air temperature observations and instead inferred them from observations of barometric pressure, sea surface temperature, and sea-ice concentration using a physically based data assimilation system called the 20th Century Reanalysis. This independent data set reproduces both annual variations and centennial trends in the temperature data sets, demonstrating the robustness of previous conclusions regarding global warming.”

  10. Martin Smith

    You said “Somehow I doubt it,” so I assume Sardeshmukh has not contacted you. That seems odd to me. If I were him, knowing your work, I would want to resolve this difference.

  11. My own feeling is that temperatures derived through global reanalysis using sparse century-old data don’t allow this level of statistical analysis of the extrema. I’d prefer to see if there are changes in adjusted ground station maximum temperature data (standard deviation, distribution, hot extremes) over these periods… this would probably require several dozen or several hundred stations to achieve some level of statistical weight.

  12. The most interesting for me is the distribution of data little to the right from the 2 sigma limit. At the 45E plots the blue line goes above red one, for temperature anomalies higher than ~2.25 K. If one chooses arbitrarily the heat wave definition at 2.5 K one can indeed infer no changes of heat waves.
    Apparently the blue distribution has much longer tails than the red one. I don’t think that this is realistic. I gues that the reanalysis based one sparse data from the beginning of the century is simply less contrained by reality and hence can wander more freely.

  13. I concur with Magma. Here is a report that touches on this subject.

    Cornes, R. C., and P. D. Jones (2013), How well does the ERA-Interim reanalysis replicate trends in extremes of surface temperature across Europe?, J. Geophys. Res. Atmos., 118, 10,262–10,276, doi:10.1002/jgrd.50799.
    From the abstract:
    The reanalysis is least successful in replicating trends in the number of days exceeding the 90th percentile of maximum temperature, particularly during the summer season. The success of the reanalysis is also somewhat dependent on the time step of the reanalysis data used. Daily maximum and minimum temperatures calculated from the 3-hourly time step reanalysis data tend to be more reliable than those derived from the 12-hourly data.

  14. It’s interesting that the the (1-cdf) plots do cross at around 2.2K for the first location, if I am reading them right. I suppose this is because of the more heating at night effect.

    [Response: It’s by no means certain, because if you compute the uncertainty in (1-cdf) (which I did) you find that for temperatures that rare, the difference cannot be established with statistical significance.]

  15. Is there a confusions between ECMWF’s ERA-20C and NOAA’s 20CR here?

  16. Worst possible news: I just learned “British Journal of Science” is a scam journal. Boy, do I feel stupid.

    [Response: You have my sincere sympathy. Don’t feel to bad, it’s all too easy to fall for a scam. I just hope you didn’t send ’em any money.]

    • Barton, don’t consider any journal on Jeffrey Beall’s list of predatory open access publishers. Google will pull it up.

    • The whole point of a scam is to trick people, and when it comes to scam journals, to trick reasonably bright people, so don’t beat yourself up for “feeling stupid”.

      It’s a con. Something con artists do. “con” in this context is an abbreviation of “confidence”, i.e. the trick lies in gaining the confidence of the victim. Since honest people like ourselves tend to believe that most people we’ll run across in our daily lives are at least somewhat honest, and in the context of something like journal publishing find it hard to believe that outright dishonest people do business in the field, it’s simply normal to take things at face value.

      You’re a victim of a scam. That doesn’t make you stupid, it makes them smart, exploitive, dishonest, hell-deserving scum of the earth.

    • Bernard J.

      BPL, I’d like to echo dhogaza’s comments.

      Our institution (top 10 in Aus, top 2% internationally) has both postgraduate students and academics who are still caught by these predatory journals. And although rare it’s not always the less-expereinced academics who are caught out. We have an active campaign to educate people about it but it could probably be ramped up further as it can be very difficult to discern between an up-and-coming new journal with sincere intent, and a flat-out dodgy one. And sometimes the former seem to be the latter simply because they don’t yet have a reputation – even Bealle occasionally switches particular journals from one category to the other. To make it even more difficult some journals operate with a fairly predatory business model, but have a fairly good standard of review: overall there’s a smeared spectrum of quality in the publishing milieu.

      The predators are well organised, harvesting email addresses from web pages and sending a constant flow of (often very credible-appearing) solicitations, to the extent that some researchers occasionally receive several per day. For the unwary they can be very hard to spot, and some are difficult to identify even for those who have experience investigating them.

      The watchword here is, almost tautologically, “vigilance” – even for the most intelligent amongst us.

  17. Oh, I sent them 150 GB pounds, which is (today) $242.35. That’s not what bothers me. What bothers me is that I might as well have published in “The Journal of Easily Scammed Scientist Wannabes.” I might just have destroyed my reputation as a serious scientist, and worse, discredited the research itself. I have a strong urge to jump off the 6th St. bridge.

    [Response: You are far more capable and intelligent than you’re giving yourself credit for right now. As for being a serious scientist, don’t worry about reputation — just do science.

    There is no blow to your reputation that can’t be overcome, and those who would judge you based on publishing in a scam journal will get exactly as much respect as they deserve (at least, here). Believe me. It’s true.

    It is a pity that good work will be tainted by where it ends up. But the greatest pity in all this, might be that a worthless garbage journal could point to good work and claim “credit” by riding on your coattails.]

    • What Tamino said


    • BPL, I think you can retract the work and present it in another venue. I’m sorry there are all these outfits. It is obscene.

    • Barton, you are far from the first to be scammed. I know of Nobel Prize winners who turn up at OMICS conferences and are used as advertizing material for those conferences. Sadly, the majority of the other participants are the other speakers, of which most will have paid for the conference themselves. Gaps in the schedule are common because the speaker did not show up (not uncommonly because they never agreed), etc.

      I also have colleagues who are on Editorial Boards of certain scam journals. Some (try to) get off the Editorial Board right away when I inform them, others just shrug their shoulders, because they don’t do anything anyway and they don’t put it on their CV. The latter people really bug me, as they are in a way co-responsible that sincere people like you get scammed.

    • Evidence that more experienced people are fooled, too, is here:
      The first author, Bloetscher, is an Associate Professor at Florida Atlantic University.

  18. Fred Rogers

    The only observations in the early 20th century at the two points you chose were probably ship observations. If you look at the COADS dataset for ship observations, you can see perhaps 1 or 2 observations of surface air temperature per month in the general vicinity. No upper air observations at all, of course. So I doubt that the 850pa numbers in the reanalysis mean very much. Perhaps Western Europe or the eastern US would be better choices, but even there upper air observations were quite sparse. I personally think that it’s pointless examining global reanalysis data earlier than 1979 (the satellite era).

  19. David B. Benson

    A small point: the SI unit for absolute temperature is “kelvin”, abbreviated by “K”. There is no degree symbol as “degree” is not part of the name.

    This is different than the Celsius scale for which the degree symbol is to be used, after a space separating it from the number. Example: 33 °C = 306 K.

    • David B. Benson.
      Thank you for the pointer. It’s the sort of detail that would go unnoticed in the general run of things. I see that the ° was killed off back in 1969, which is long enough ago to leave but one potential defence – “Bill Tompson invented them. He called them degrees of temperature so that’s what they are!!!” – although I’m not sure that he did.

    • Pierre-Normand Houle

      That’s good to know. So, before the 13th General Conference on Weights and Measures, (1967–1968), it was called “degree Kelvin”, and written °K. Thereafter, it’s been called Kelvin, and written K. Tough his nephews and nieces always called him “uncle Bill”.

  20. Steven Mosher

    Ordinarily I would email the guy privately and ask him.

    it would be way easier if people posted code and data.

    its very easy to go wrong with reanalysis data.

    [Response: I’ve already done so.]

  21. If we’re interested in extremes of temperature, why did Sardeshmukh elect to work with anomalies? Is there a good reason for that choice I’m missing? Surely this is one case where the absolute temp is more important. Do people really care whether their heat stroke was caused by a 1-sigma July or a 3-Sigma April?

    [Response: I was wondering the same thing.]

    • Forgive me if this has already been mentioned, but if there were a change in the higher order statistics, wouldn’t this be sensitive to the choice of anomaly baseline period? That is, due to seasonal shifts, one would expect longer tails the farther one gets from the baseline if the PDF shape is changing.

    • Well, both types of extreme (absolute, and relative) matter in their own ways. For example, in a cool temperate climate, temperatures might routinely be below 0 C (i.e., freezing) for many days during winter … but a 48 hour period of sub-0 temperatures during the growing season might wreak havoc with particular crops. On an anomaly basis, this freak summer cold wave would stand out as “extreme” . If you don’t use anomalies, it would be hidden among the statistics of many similar days in winter.

  22. Noting the link provided by ErnstK in the comments as the probable source of Sardeshmukh’s data:

    This reanalysis is somewhat different from the ERA and other reanalysis data sets, as it only assimilates sea level pressure! Therefore its not unlikely that what goes on at 850 hPa may have quite different statistics compared to other reanalysis data sets. I suspect that confirming Sardeshmukh’s results will need to be done with the same reanalysis data set (although comparison with other data sets should be equally interesting).

    • Horatio Algeranon

      Given the availability of historical land surface thermometric readings taken where people actually live, what’s the point in even using “reanalysis” data (especially for 1.5km high)?

  23. Timothy (likes zebras)

    Doesn’t HadCRUT4 go back further than this? It would certainly be more suitable for this type of analysis.

    It’s a really bad habit among some researches to use reanalysis data inappropriately. In particular, you have to be very careful about using mid-tropospheric products from reanalyses, because they will not be homogenous.

    The later period will be much more tightly constrained by satellite observations, while the earlier period will only be constrained directly by a much smaller number of radiosonde and aircraft observations and indirectly by surface observations, though in that case mediated by the convection and cloud schemes in the reanalysis model.

    I know the products are supplied at a high resolution with no missing data areas, which makes for prettier plots than the HadCRUT4 data, but it’s really important to emphasise that this data is not appropriate for this sort of analysis.

    I know that tamino here is trying to understand someone else’s analysis, and so didn’t make the data choice, but I thought it worth pinting out for the unwary.

    There are specific surface temperature observations datasets, such as HadCRUT4, BEST, GISS, etc, which should be used in preference.

  24. Joan Savage

    I’m a retired ecologist, and in plant phenology, at least anecdotally over twenty years, I now see more variability in spring plant bloom time, a variability big enough to occasionally mask the overall trend (mean) towards earlier spring blooming. I would not be surprised if the combo of shifting means and spreading standard deviation showed up in many phenomena, like the stream flow data for which USGS calculates annual-exceedance-probability on a regional basis.

  25. I’m guessing that the standard deviation has decreased with time. That is the easiest way to explain why the rise in extreme values isn’t as big as it would be with constant standard deviation and an increasing trend.

    That is, and rise in extreme values is most likely an artifact caused by some change in data over time. If other temperature data (e.g. surface temperatures) don’t show the same behaviour, that would increase my doubt over the 850 h Pa data.

  26. Horatio Algeranon


    Something shifty’s in the air
    Reanalysis everywhere
    Making heatwaves disappear
    By choosing data way up there

  27. Here’s a link to my article, under the incredibly appropriate title of “Development of Broncho-Pulmonary Segments in Albino Rat:”

  28. Martin Smith

    Is this still moving toward resolution?

    [Response: Nothing new.]