(One of) the Problem(s) with Judith Curry

Judith Curry was recently a witness testifying at a hearing before the Environment and Public Works Committee of the U.S. Senate. Her written testimony is available here.

Continue reading

Malpractice

Some of you may have heard of the “journal” Pattern Recognition in Physics. It’s a new journal (only 2 issues), but already many of us have come to regard it as nothing but a mouthpiece for some rather loony climate denier nonsense.

It’s published by Copernicus Publishing, an otherwise reputable outfit. Have they undermined their credibility forever?

No!

http://www.pattern-recognition-in-physics.net/

Southern Discomfort

In AR4 (the 4th assessment report of the Intergovernmental Panel on Climate Change) the trend in Antarctic (southern hemisphere) sea ice was reported as small (5.6 +/- 9.2 thousand km^2/yr) and not statistically significant, but in AR5 (the 5th assessment report) it is reported as both statistically significant and much larger (16.5 +/- 3.5 thousand km^2/yr). Even at this rate the Arctic is still losing sea ice 3 times as fast as the Antarctic is gaining it, but the larger trend is still surprising; such a large rate of increase is, more and more, turning out to be incompatible with computer model simulations.

A new paper submitted to the Cryosphere Discussion (Eisenman, I., Meier, W. N., and Norris, J. R.: A spurious jump in the satellite record: is Antarctic sea ice really expanding?, The Cryosphere Discuss., 8, 273-288, doi:10.5194/tcd-8-273-2014, 2014) suggests that much, if not most, of the upward trend in southern hemisphere sea ice may be due to a spurious jump caused by an undocumented change to how the data are processed. It also explains the dramatic difference in the state of affairs between what was reported in AR4 and what was in AR5 just a few years later.

Continue reading

Smooth 3

In the last post we looked at smoothing time series by focusing mainly, or exclusively, on local data, i.e., data which are nearby (in time) to the moment at which we’re trying to estimate the smooth. This is usually accomplished by having an observation window or weighting function which is centered at the moment of estimation, and we let this window “slide” through time to generate smoothed estimates at all times. We even showed that fitting a polynomial or a Fourier series globally (i.e. to all the data at once) gives results which are, for times not near the edges of the observation window, nearly the same.

Continue reading

Aitazaz Hassan: a typical Moslem

Maybe you don’t trust him. Maybe you even fear him.

Here in the U.S., there’s a very strong anti-Moslem sentiment from a large segment of the population. Don’t bother denying it in the comments, I live here and I know.

Most of those who fear or mistrust followers of Islam have an image of the “typical” Moslem being a terrorist, ready to die with assurances that beautiful virgins await him in heaven, if only he can send some infidels to hell when the bomb goes off.

Continue reading

Smooth 2

[Note: please see the UPDATE at the end of this post]

In the last post we looked at smoothing data by fitting a polynomial, and by fitting a Fourier series. We found that we could control the degree of “smoothness” in either case (by polynomial degree or by cutoff frequency) and that with the right amount of smoothness both methods did a reasonable (and sometimes outstanding) job over most of the observed time span, but did much worse near the endpoints of observation. Let’s forget about the endpoint mess for the moment, and concentrate on the performance of the smoothing near the middle of the window of observation.

Continue reading

Smooth 1

Suppose you are asked to determine how y, the severity of an unhealthy reaction to some toxin, relates to x, the amount of toxin one is exposed to. You expect ahead of time that the relationship will be monotonic, i.e. that increasing the exposure cannot decrease the severity, and that in fact the relationship will be “smooth” (whatever that means). You also have just enough laboratory resources to collect some actual data, so at equally spaced exposure levels from 1 to 9 you measure the reaction severity as accurately as you can, coming up with these data:

Nine

Suppose now that we want to know the likely severity at some point in between those integer-valued measurement points from 1 to 9? What if we wanted to know the severity for an exposure of 4.5, or 2.271? In other words, what if we want to interpolate to get an estimate?

The classic way would be to define a function at all values x from lowest to highest — not just the x values at which we have observations — by simply “connecting the dots” (as has already been done in the above graph). The line segments from one data point to the next define the function values at intermediate x values.

That’s fine, and the function so defined is at least continuous. But it’s not smooth, it makes instantaneous changes in direction at the observed x values (in other words, it’s derivative is undefined at the observed x values). In between those values, it’s perfectly smooth — a straight line segment is as smooth as can be — so this “classic” interpolation function is piecewise-smooth. But it isn’t smooth, and the merest glance at the jagged corners connecting the straight-line segments makes that obvious.

And we already said we expect y to be a smooth function of x. Perhaps we could find a smooth function which matches all the data values at the measured levels. It’s easy to do so: we can fit any 9 data points with an 8th-degree polynomial. Then we can hope that the polynomial approximates the smooth behavior we’re after, and compute the value of that polynomial at the “in-between” points in order to interpolate. The smooth function looks like this:

poly8

Ouch. That’s kinda wiggly and bumpy in ways that you don’t expect reality to be. First of all, we expected a monotone increase — up only — so that points with higher x can’t have lower y, but this “smooth” curve goes both up and down willy-nilly. Hell, it even dips to negative values, which for our “severity” variable is nonsense. And, it just keeps changing direction too fast and too often to be sufficiently “smooth” for us, it’s too “wiggly” and “bumpy”, too … “rough.”

Well, polynomials aren’t the only way to model a completely general function: we could use a Fourier series instead. We can model any nine data points with a 4th-order Fourier series, which gives us this:

Fourier4

Pfeh! That’s just as bad the polynomial fit!

What is Smooth?

Hold on for a moment here. We started out saying that polynomials (and the Fourier series) could give us a smooth function through our data — and they do. But then we say it’s not “smooth enough”? Just what does one mean by “smooth” anyway?

The strict mathematical definition of a smooth function is one which is infinitely differentiable. Polynomials meet that criterion, as do sines and cosines, so both polynomials and Fourier series do indeed provide us with smooth functions. But we didn’t like the fast wiggling, the jiggling around from point to point of our smooth functions either. What we really want is something that’s more smooth than just the fit-all-the-data-perfectly option.

It just so happens that most of the “rough stuff” is the fast stuff, and when it comes to Fourier analysis, the fast stuff is the high-frequency stuff. What if I fit a Fourier series to my data, but only to 1st order, i.e. using only the lowest non-zero frequency? Then, I should have a slower — and, we expect, smoother — fit. It looks like this:

Fourier1

The fit isn’t very good. But it isn’t horrible either. There’s genuine (and statistically significant!) correlation between the data values and the model values. It obviously doesn’t capture the variation well, but it does capture some important aspects of the overall quantitative behavior.

It also highlights a difference when we try to model our data using only sufficiently “smooth” functions. Namely, that our model no longer matches all the data perfectly. There are differences, which we can call residuals. In the case of our 1st-order Fourier fit, the differences are substantial.

We could try to capture more of the “signal” — or so we might call the true value of y as a function of x if we thought such a thing even existed — by using a higher order Fourier series. If we go to 2nd order, we get this model:

Fourier2

The match is better but not great, and it’s already wiggling around faster than we were hoping for. But it is getting closer.

What if we try the “use only slow functions” strategy with polynomials? In this case “slow” generally means low degree while “fast” means high degree. If we limit ourselves to a 2nd-degree (quadratic) polynomial, we get this model of the data:

poly2

Now we’re getting somewhere! The fit is outstanding, it’s plenty “smooth” enough to satisfy anybody, and it’s always going in the “up” direction.

There are still residuals, although they’re quite small and don’t show any apparent pattern. What we have done is to separate the data into two parts: the smooth part (in this case, a quadratic polynomial) and the rough part (the residuals).

smooth9

We might even hypothesize that the smooth (in this case, the quadratic fit) is our best approximation of reality, and that the residuals are an example of the “noise” (departure from expectation) in the system.

And we’d be exactly right. These data were created by computing a quadratic function and adding random noise.

Noise Response

Since the signal itself is quadratic, so is the best polynomial degree for smoothing. For higher degree polynomials, the excess wiggling — especially at the endpoints — is due to the noise in the data. If we fit a straight line (a 1st-degree polynomial) to some data, then we’ve modelled the data with two functions (f_o(t) = 1 and f_1(t) = t) so we require two parameters: slope and intercept. Now increase the order from 1st to 2nd for a quadratic fit. In one sense, the extra “function” we’re fitting is f_2(t) = t^2. But that function can easily have nonzero average value, so in a way it is “like” the function f_o(t) = 1 which we’ve already included in our fit. What we’re really interested in is what this extra function does that the other functions don’t already do for us. This turns out to be captured, not by power functions f_j(t) = t^j, but by “orthogonal polynomials,” each of which is one degree higher, and all of which are “orthogonal” to each other (which, very loosely speaking, means they don’t duplicate the patterns found in the other polynomials).

The first two orthogonal polynomials are just f_o(t) = 1 and f_1(t) = t - \bar t (where \bar t is the average time value) that we’ve already been using. The next 8 orthogonal polynomials look like this:

poly2_5

poly6_9

Two things are worth noting near the beginning and end of the observed time span. First, the values are larger, more so the larger the degree of the orthogonal polynomial. Second, the wiggles get closer together, i.e. they get faster. These properties of the fit functions persist in the final result, so when we use a high-degree polynomial we tend to get much larger uncertainty (i.e. noise response) as well as bigger and faster wiggles (also due to noise alone) near the beginning and end, exactly as we observed with our 8th-degree polynomial fit. A higher polynomial degree usually only makes things worse.

The probable error of a polynomial smooth which is due to noise alone is determined by something which we can call the “uncertainty function,” which gives the expected contribution of that polynomial to the variance of the estimated y value at a given x value. The uncertainty function tallies variance, but we can take its square root to compute the “standard error function” giving the probable error as a function of time. Here it is for polynomials of degree zero (a constant) up through degree 5, for polynomials which cover the time span -0.5 \le t \le 0.5:

stderr_fun

Note how the uncertainty (i.e. the contribution of noise to the smooth function) is exaggerated near the endpoints, the more so the higher the polynomial degree — with just a 5th-degree polynomial, the standard errors are already three times as large at the endpoints as in the middle. And, don’t forget about that extra wiggling near the enpoints too; the combination of exaggerated endpoint uncertainty and exaggerated endpoint wiggling makes polynomial smoothing with degree higher than 3 or 4 at the most extremely untrustworthy near the endpoints of the time span.

Function Misfit

The “fast” (high-degree) polynomials had too much wiggling at the endpoints, but the slow (2nd-degree) worked fine. Of course that’s because the signal itself was a 2nd-degree polynomial. For Fourier series on the other hand, even in the “slow” case it didn’t fit very well. For the Fourier series the fit is poor because Fourier series are designed to create a periodic function. Whatever smooth function it returns will actually be periodic with period equal to the time span of the observed data. In fact we get the same smooth function if we fit a Fourier series to repeated copies of the data:

Fourier_periodic

Note that in order to repeat, it has to dive back down at the end of each “cycle” toward those low values at the beginning of the next “cycle.” To do so, it has to exaggerate the wiggles, especially at the end. And that’s just to fit the signal, even without any noise. This is another case where the essential properties of the functions we’re using persist in the final result.

There are many ways to ameliorate this (and other) problems, but none of them entirely eliminate it. The fact remains that periodic functions have essential tendencies which persist in any Fourier-based smooth, and the problematic aspect is the behavior of the smooth near the endpoints.

It should be mentioned that for times well away from the endpoints, a polynomial smooth and a Fourier-based smooth both give oustanding results if the “time scale” (cutoff frequency for Fourier, polynomial degree for polynomials) is well chosen.

A More Generic Smooth

We’ve tried using classes of functions (polynomials, Fourier series) and restricting them to the “slow” ones in order to keep things sufficiently “smooth.” Perhaps instead we could seek some completely general smooth function which optimizes some criterion which combines both “fit” (how closely does it match the data) with “smoothness.” It’s easy to define how well it fits the data — the sum of the squares of the residuals is only the most common of many methods. But how do we define “smoothness” for some function in general?

The idea is that it’s the bending of the smooth curve that accounts for its “roughness,” and that the bending is measured by the second time derivative of the function. Of course, for that to exist the function has to be twice-differentiable, but that’s fine because we want a nice “smooth” function. It may not be “technically” smooth (infinitely differentiable) but it will at least be smooth-looking.

To measure of the goodness-of-fit (or should I say badness-of-fit), take the usual sum of the squared residuals. To measure the roughness, integrate the square of the 2nd derivative over the observed time span. Combine these two quantities into a weighted average, giving more weight to the roughness if you want an extra-smooth smooth but more weight to the badness-of-fit if you want an extra-good fit. Hence this method involves a parameter (actually it can involve many, but let’s not get into details) which controls how “smooth” the final smooth will be. This is nothing new, with polynomials we controlled smoothness by polynomial degree and with Fourier series by cutoff frequency.

Then: find the function which minimizes the weighted average of badness-of-fit and roughness. The solution turns out to be a function which is not smooth, i.e. not infinitely differentiable, but is piecewise-smooth, i.e. it’s made of a finite number of pieces which are themselves smooth. Furthermore, the pieces are joined as smoothly as possible by requiring that where they meet, they have the same value, the same derivative, and the same 2nd derivative. The result is called a spline smooth. The pieces themselves turn out to be cubic polynomials, so the smooth function is sometimes referred to as a “cubic spline.” If we apply this method to our toxicity data, with a reasonably smooth smooth we get this:

spline9

Global and Local Smooths

Fitting functions like polynomials or Fourier series to the entire set of data, and finding a function which optimizes some measure of total goodness as a spline smooth does, might be called “global” smoothing methods because they fit a smooth to the entire data set, both computationally and conceptually. However, one can also look at smoothing as a local problem, in which the value of the smooth at some particular time is determined by the data values which are nearby in time to the given moment. In the next post, we’ll take a look at some methods and issues related to local smoothing.

Smooth

NASA’s Goddard Institute for Space Studies (GISS) has updated their global surface temperature estimate to include November 2013. It turns out that this most recent November was, globally, the hottest on record:

giss_nov

Greg Laden posted about it (and other things) recently in his continuing efforts to let people know what’s really happening to the globe (it’s still heating up) as well as spreading the word that “earth” includes a lot more than just the atmosphere. He featured this version of the graph (provided by “ThingsBreak” but prepared by Stefan Rahmstorf):

HottestNovemberOnRecord_2013

Of course this means that the fake skeptics must come out of the woordwork. Referring to the smooth (the red line on the graph), here’s what Paul Clark had to say about it:


It’s not clear how this red line was obtained. The red line is not described on the poster’s page. The graph comes from, what Laden describes as, “climate communicator” ThingsBreak. What on earth is a “climate communicator”?!

It seems to be some type of smoothed moving average. Five year spline perhaps?

Problem is, the red line is roughly in the middle of the blue line, except at the end. At the end, the red line is not in the middle at all, but is down at the beginning, and up at the end, of that final 10 year period. It’s shooting right up at the end!

How can that be? I therefore find this line to be completely made up, and a case of wishful thinking.


Here’s something about which every honest participant in the discussion of man-made global warming should think. Carefully. Namely, this: Paul Clark complains that it’s not clear how the red line (the smoothed version of the data) was obtained. Furthermore, it doesn’t seem right to him. How does he react?

Did he acquire in-depth knowledge of smoothing techniques? (I can tell you for a fact: no he didn’t.) Did he consult a disinterested expert? (Apparently not.) Did he, oh I don’t know, maybe ASK how it was obtained? (Nope.)

You see, those are some of the ways an actual scientist might proceed. The guiding principle being this: LEARN MORE ABOUT THE SUBJECT *BEFORE* YOU OPEN YOUR MOUTH.

It seems that’s not Paul Clark’s way. He doesn’t think the smooth (red line) looks right, but with little to no effort at all to find out about it, he declares that it is “completely made up, and a case of wishful thinking.” I declare that Paul Clark’s opinion is completely mistaken, and just about as clear a case of the Dunning-Kruger effect as you’re likely to find.

Here’s something else worth thinking about: suppose I wanted to make the slope at the end artificially large. What smoothing method — other than “force it by hand” — could do that?


Rahmstorf used a smoothing method based on MC-SSA (Monte Carlo singular spectrum analysis, Moore, J. C., et al., 2005. New Tools for Analyzing Time Series Relationships and Trends. Eos. 86, 226,232) with a filter half-width of 15 yr. I get a very similar result using my favorite method (a “modified lowess smooth”) with about the same time scale.

giss_nov2

My modified lowess smooth is in agreement with Rahmstorf’s MC-SSA smooth. Here’s just the modified lowess smooth (in red), a plain old plain-old lowess smooth (in green) for those who don’t trust me to modify anything, and a spline smooth (in blue):

giss_nov3

One of the things I like about my own smoothing program is that it also calculates the uncerainty of the result. Here are the three smooths I computed, together with dashed red lines to show the range 2 standard deviations above and below:

giss_nov4

The three methods are in agreement, within the limits of their uncertainty. Clearly.

Now let’s take the range of the modified lowess smooth which we plotted in the previous graph, and add some other smooths set to about the same time scale for smoothing: an ordinary moving average in black, a Gaussian smooth in green, and a 6th-degree polynomial (as used by Paul Clark himself) in blue:

giss_nov5

The moving-average line stays within the range indicated by the modified lowess smooth, but that’s easy because the moving averages don’t extend to the ends of the time series, we lose years at both the beginning and end. The Gaussian smooth stays within the range indicated by the modified lowess smooth except at the end, when the Gaussian smooth levels off. Is Paul Clark wondering why that might be? Does he know enough about smoothing in general, and about Gaussian smoothing specifically, to have expected that? I did.

Perhaps most interesting is the 6th-degree polynomial, which wanders outside the modified lowess range, not just at the beginning or end but in the middle as well. What’s really interesting is why it wanders outside the range, because it happens for different reasons at different times! The 6th-degree polynomial fit smooths too much in the middle of the time span, but smooths too little near the endpoints. Is Paul Clark wondering why that might be? Does he know enough about smoothing in general, and about polynomial fits specifically, to have expected that? I did.

Ordinarily, this is where I would launch into a technical discussion of smoothing. Why do certain methods tend to go one way more than another? What should one expect near the endpoints of the time span? How do smooths with longer time spans compare to those with shorter time spans? Why is the Gaussian smooth questionable near the endpoints? Why do high-degree (and 6 is a pretty high degree) polynomial fits really really suck as smoothing methods, especially near the endpoints of the time span. Yes, they really suck, and the reason is actually quite interesting.

But I’m not gonna. At least not yet. It’s not my job to educate ignorant Dunning-Kruger victims about smoothing techniques.

But here’s an offer for Paul Clark: Come to this blog, find this thread, and post a comment in which you admit — without a bunch of caveats or excuses or bullshit — just admit in no uncertain terms that you don’t know enough about smoothing to know how valid Rahmstorf’s MC-SSA smooth is or why your 6th-degree polynomial choice is a really really sucky choice. You don’t have to weep and moan, just simply admit that you don’t know enough about this topic to justify your opinion. You don’t have to admit anything else, just that you’re ignorant about smoothing methods. Don’t clutter the comment up with unrelated stuff, if you want to spew about other things put that in a separate comment. Just a single, simple admission of ignorance on this topic.

If you’ll do that, Paul Clark, then I’ll do a blog post on smoothing. Or maybe two. Maybe even three — it’s a topic of great interest for me. How ’bout it, Paul? All you have to do is admit that you’re ignorant of the subject, and I’ll educate you.

In case that offer isn’t acceptable, here’s another. Paul: I’ll blog about the topic and you don’t even have to admit anything. But if you want me to supply some lessons without you admitting your ignorance — pay me. Cash American.

Fire Down Below

Australian prime minister Tony Abbot got elected saying, among other things, that global warming science was a bunch of crap.

Now he says that “Climate change is real, as I’ve often said, and we should take strong action against it…” Why the amazing massive ginormous flip-flop? Because Abbot is feeling the heat. So are a lot of Australians as they suffer through tremendous bushfires devastating huge areas of New South Wales. Australia has always been prone to fire, but the scale of this event is astounding. So too is the timing — it isn’t even summer yet down under. But it’s absolutely clear that “fire season” has been getting longer in Oz, starting earlier and ending later. And the reason for this very early outbreak: an extra-hot and extra-dry winter, exacerbated by — you guessed it — man-made climate change. Global warming.

Continue reading

The ICP report

Many of you are probably aware of a “report” which is intended to contradict the IPCC (Intergovernmental Panel on Climate Change) report. Its authors call it the “NIPCC” report for “Non-governmental International Panel on Climate Change.” It’s supposed to represent the very best that so-called “skeptics” have to offer.

Continue reading