Tag Archives: mathematics

Sampling Rate, part 2

“Suppose we have a signal which is band-limited, say it’s limited to the frequency band from 0 to 0.5 cycles per day,” says the engineering professor to the class in digital signal processing. “If we observe this signal at regular intervals with a sampling rate which is at least twice the bandwidth — in this case, at least once per day — then we can use Fourier analysis to reconstruct the signal. We can even interpolate it to fill in the gaps. This is one of the most common applications of Fourier analysis in the real world — we observe a signal, then use its Fourier transform either to reconstruct the signal or simply to identify its Fourier components (and therefore its physical nature).”

Sampling Rate

T, from my mechanical engineering world we have strict rules on sampling rates vs. signal frequency rates. Ie you cannot reliably measure a 60hz ac sine wave with a 5hz analog sampling device. The result ends up being strange results that don’t show spikes well and also might not show averages well either. Can you help me understand how 120 year sampling proxies can resolve relatively high frequency temperature spikes?

This objection comes up so often from those who are accustomed to data which are evenly sampled in the time domain, and the misconception is so firmly imprinted on so many people, that it’s worth illustrating how uneven time sampling overcomes such limitations.

Back to School

Much of what’s wrong with the online discussion of global warming is revealed by a recent reader comment on RealClimate.

Greg Goodman thinks that he’s taking climate scientists to school — he actually “lectures” the RealClimate readership about their supposed need to “dig a bit deeper” into the data on Arctic sea ice (both extent and area). He shows a graph based on some analysis which — unbeknownst to him — actually reveals that he doesn’t know what the hell he’s doing. He thinks he has established the presence of “cyclic variations” of which the climate science community is ignorant, and concludes that climate scientists are missing “important clues” about “internal fluctuations” which, of course, those inadequate computer models just can’t handle.

One would be hard pressed to find a more clear-cut example of hubris.

Climate scientists who study sea ice have been all over the data, every piece of it, but instead of making the mistakes Goodman makes they’ve been as careful and rigorous as their expertise and experience allow. They have certainly dug a whole helluva lot deeper than Greg Goodman has, or probably is capable of. It’s Goodman who needs to go back to school.

Theil-Sen

A reader recently inquired about using the Theil-Sen slope to estimate trends in temperature data, rather than the more usual least-squares regression. The Theil-Sen estimator is a non-parametric method to estimate a slope (perhaps more properly, a “distribution-free” method) which is robust, i.e., it is resistant to the presence of outliers (extremely variant data values) which can wreak havoc with least-squares regression. It also doesn’t rely on the noise following the normal distribution, it’s truly a distribution-free method. Even when the data are normally distributed and outliers are absent, it’s still competitive with least-squares regression.

Hiatus

I haven’t posted much lately because I’ve been hard at work on my new book. It’s titled Understanding Statistics, and I expect to finish in a week or two. I’ll be sure to post here when I do, hoping that lots of you will buy it. Even if you don’t need one for yourself, you might know somebody who would enjoy and make good use of it. Who knows, maybe 20 of you will send a copy to Anthony Watts. Maybe he would learn something from it. Irony of the richest kind.

Your Servant

Every mathematician develops his own preferences for notation. This is necessary because there are often (I’m tempted to say “usually”) many notations for the same concept.

Skin a Cat

Before I begin let me make it clear that this is not about abusing cats. I love cats. We have a cat. We treat him very well. He treats us as though it’s our duty to worship him. He’s a cat.

This is about the old adage that “there’s more than one way to skin a cat.”

Nothin’ but Noise

Pat Michaels claims (also here) that the journal Nature has lost its credibility. That’s an extraordinary claim, considering that Nature is one of the most prestigious peer-reviewed science journals in the world. There are those who believe Pat Michaels is the one lacking any credibility.

L is for “linear”

A previous post addressed some issues with linear regression, “linear” meaning we’re fitting a straight line to some data. Let’s devote another post to scrutinizing the issue — so this post is all about the math, readers who aren’t that interested can rest assured we’ll get back to climate science soon.

It was mentioned in a comment that least-squares regression is BLUE. In this acronym, “B” is for “best” meaning “least-variance” — but for practical purposes it means (among other things) that if a linear trend is present, we have a better chance to detect it with fewer data points using least-squares than with any other linear unbiased estimator. “U” is for “unbiased,” meaning that the line we expect to get is the true trend line. Both of these are highly desirable qualities.

Finally, “L” is for “linear,” which in this context has nothing to do with the fact that our model trend is a straight line. It means that the best-fit line we get is a linear function of the input data. Therefore if we’re fitting data x as a linear function of time t, and it happens that the data x are the sum of two other data sets a and b, then the best-fit line to x is the sum of the best-fit line to a and the best-fit line to b. In some (perhaps even many) contexts that is a remarkably useful property.

Gutenberg-Richter

In a comment on the last post, it was mentioned that the frequency of earthquakes of any given magnitude or greater will be given by the Gutenberg-Richter law. It states that the expected number of earthquakes in a given region over a given span of time, of a given earthquake magnitude or greater, will be

$N = 10^{a-bM}$,

where $M$ is the quake magnitude, $a$ and $b$ are constants, and $N$ is the expected number. For active regions, the constant $b$ usually has a value near 1.