Smooth 2

[Note: please see the UPDATE at the end of this post]

In the last post we looked at smoothing data by fitting a polynomial, and by fitting a Fourier series. We found that we could control the degree of “smoothness” in either case (by polynomial degree or by cutoff frequency) and that with the right amount of smoothness both methods did a reasonable (and sometimes outstanding) job over most of the observed time span, but did much worse near the endpoints of observation. Let’s forget about the endpoint mess for the moment, and concentrate on the performance of the smoothing near the middle of the window of observation.

We’ll make it as simple as possible. Suppose we have a lot of observations xk, k = 1,2,…,N of some variable over time, and the times of observation extend from -0.5 to +0.5 so the width of the entire window is exactly 1 time unit. We further insist that the observations are evenly spaced throughout time. What would our smoothing techniques give for the smoothed value at the time t = 0, i.e. at the midpoint of the observation window?

Let’s take our data and fit a polynomial of degree zero. This is just a constant, and the least-squares solution to xk = const is f(t) = $\bar x$, where $\bar x$ is the average of all the xk

$\bar x = {1 \over N} \sum_{k=1}^N x_k = {1 \over N} \sum_{k=1}^N w_k x_k$,

where wk = 1 for all k. Note that at the last step I’ve inserted a mysterious and apparently unnecessary “weight vector” wk — unnecessary because in this case it happens to be equal to 1 for all k.

Since our “smooth” is a constant throughout time, its value at time zero is equal to that constant, i.e., to $\bar x$, which is equal to the mean value of the product of a “weight vector” wk with the data vector xk.

Now let’s fit a 1st-degree polynomial (a straight line) instead of a constant. This will certainly give us a different smooth! But it turns out that the difference makes no contribution, none at all, to the smoothed value at time t = 0. It’s still the mean value of a “weight vector” wk with the data vector xk, and the weight vector is still 1 for all k values so it’s also equal to the arithmetic mean of the data.

Now fit a 2nd-degree (quadratic) polynomial. In this case, the smoothed value at time t = 0 is still the mean value of a weight vector times the data vector, but the weight vector is no longer all 1’s. Instead, it looks like this over time:

Note that the largest (absolute) values of the weights are near t = 0, so data points near (in time) to t = 0 have more influence on our smoothed value at t = 0 — as they should. Note also that large values near the extremes tend to make the smoothed value at t = 0 negative because their weights are negative, and by doing so they define the contribution of the quadratic term to our smoothed estimate at t = 0.

Going to 3rd degree (quadratic to cubic) make no change in the weight vector, as did going from constant to linear. In fact, because we’re only considering the smoothed value at t = 0 and our observations are equally spaced, the odd-numbered polynomials don’t really affect this.

But at 4th degree our weight vector has more wiggles

and at 6th degree more still

In fact if we go all the way to a 24th-degree polynomial, the weight vector looks like this:

It’s taking on a characteristic shape. As the order of the polynomial increases, the weight vector which defines the smoothed value at t = 0 approaches the sinc function, plotted here in red atop the weight function for polynomial degree 24:

In both cases, the largest (absolute) values are the weights near t = 0 so data values near then will dominate the smoothed estimate. It’s almost like our smoothed value at t = 0 will be a “weighted average” of the data values near t = 0. This isn’t actually correct because some of the weights are negative so it’s not really a weighted average, it would better be called a “mean weighted value” or something like that.

Let’s try something only slightly different. Let’s smooth our data with a 1st-order Fourier series, then compute the smoothed value at time t = 0. This too turns out to be the mean weighted value of a weight vector wk with the data xk, with this weight vector:

That’s certainly reminiscent of the weight vector for a 2nd-degree polynomial. Let’s go to a 2nd-order Fourier series to get this weight vector

Hmmm… I see a pattern emerging here. Let’s kick it up to an 8th-order Fourier series:

Once again, the smoothed value at t = 0 is dominated by data values near t = 0, with a wavy-looking weight function to account for more complex behavior. And once again, we can compare this weight function to the sinc function and find it matches even better:

This suggests that it might be useful, when smoothing, to focus attention mostly — or even only — on the data values which are actually nearby (in time) to the moment at which we’re trying to compute our smooth. We might even define a weight function of our own, and use it to compute a “mean weighted value” by shifting its peak from moment to moment — placing the peak of the weighting function at whatever particular time t (whether zero or not) we’re focusing on at the moment.

In fact, for higher polynomial degree or Fourier order, computing the entire smooth (not just at t = 0) with polynomials or Fourier series is actually quite close to just that — taking a weight function (the one for “at t = 0″) and shifting it through time to compute a “mean weighted value” at each moment which depends mostly on the data values near that moment. The process itself — a “sliding” mean weighted value — is nothing more nor less than the convolution of the data with a filter, with the filter being the weight vector itself.

All this nice filter stuff depended on our estimating the smooth in the middle of the time span, or at least not being near the edge of time. Rather than consider how edge effects play out in our polynomial or Fourier smooths, or how things change when the time sampling is uneven (which can get quite complicated), let’s focus on the essential idea that our smooth should depend on the data nearby in time, and elevate that to the status of a general principle. Essentially, instead of smoothing data by fitting some function or sequence of functions globally, let’s compute smoothed values based on using mainly — or only — local data.

Of course that raises the issue of what “local” means. We are free to choose whatever definition suits our need, and in particular by making “local” mean a very narrow or very broad slice of time we can control the degree of “smoothness” of our estimate, just as we did for global fits by varying their parameters (polynomial degree or cutoff frequency).

The Simplest Smooth

The simplest way to estimate the smoothed value at a particular moment of time would be simply to average the data near that moment. Since we’ve elevated “nearby in time” to a principle, we do have to choose how nearby in time, i.e. how wide our “averaging window” is in order to set the time scale of the smoothing.

Let’s try the simplest thing of all: just cut time into slices, maybe even make them all the same width, and average the data within each slice. We’ll even call each slice a “bin” just to use the same terminology as others. Heck we could even compute a standard error associated with that average and a confidence interval if we wanted to [note: we will utterly ignore autocorrelation and treat the noise as “white noise”]. We could do this with, say, 10-year long time slices of annual average global temperature from NASA GISS, to get this (data in black, 10-year averages with error bars large red dots):

That’s great, in fact we do this kind of thing all the time — but is it smoothing? It doesn’t compute values at all the observed times and it isn’t defined for in-between values. It’s not even clear how to compute residuals.

But hey — it does accomplish the purpose (or at least, one of the purposes), namely, that it reduces the noise a lot more than it alters the “signal”, if such things exist. More correctly it reduces the total variation of the “fast” part much more than it interferes with the variation of the “slow” part, so it has at least estimated the slow variation for us.

And that’s one of the purposes of smoothing. In some particular cases it may be the only relevant purpose, in which case this simple computation of averages has done the job. Something to be aware of is that there are no “smooth police” to make sure you do everything according to traditional standards. If a particular method tells us what we wanted to know without leading us astray, then it’s doing its job.

Another benefit of this simple averaging scheme is that we can describe the “essential” (i.e. slow, or long-term) changes using fewer data points. This used to be one of the most valuable aspects, because it means fewer calculations. For extremely large data sets, computing the Date-Compensated Discrete Fourier Transform oversampling the spectrum at 20 times the “nominal” frequency resolution can be a bit of a problem, requiring prohibitive resources for data storage/access and arithmetic calculations. If instead we analyze the averages rather than the raw data, we can reduce our computational workload while losing almost no information about long-term behavior. We will lose the information about short-term (fast) behavior, but if we’re interested in the long-term then that’s OK. [Important note: if you do this, in the “reduced” data set make the time equal to the average time of all data within the bin, just as the value is the average value within the bin.]

And for those anal-retentive smoothers who insist on a function defined at all intermediate times, we can even get this result from a least-squares fit of “step functions” which are 10 days wide (or however wide our averaging interval is):

This enables us to interpolate (since it’s defined at all intermediate times), and to compute residuals if we want to study them. (Personally I’d rather define a complete function by the “connect-the-dots” procedure). For those who object that this “smooth” isn’t at all “smooth,” I suggest you enroll at the “smooth police academy.” Lesson 1: this kind of smoothing is out of your jurisdiction.

We don’t have to restrict ourselves to non-overlapping time slices. We could start with the first 10 years of data and use them to compute an average time and average data value, then instead of shifting the “averaging window” 10 years ahead, we shift it only 1 data point ahead in time and compute yet another average time/data. In fact we can keep doing this, advancing the averaging window by 1 data point each time, until our window finally covers the final data point. This will give us a set of “average times” and “average values” which, when applied to global temperature from NASA, looks like this:

This smooth isn’t very smooth either, which is only a problem if we care about that. This is the simplest case of what’s called a “moving average.”

But another, very real problem is that the “smoothed” data covers a smaller time span than the raw data. Our very 1st bin average is (for evenly sampled data) halfway into the first “bin,” so we lose about half a bin’s worth of time at the beggining of the data set, and again at the end. Data from 1880 through 2013 will therefore become “smoothed data” from 1885 through 2008. That’s no problem deep in the heart of the 20th century, but if the thing you’re most interested in is the behavior near the end of a time series (what’s happening now), then losing part of the time span at the end is a distinct disadvantage of moving averages.

There are ways to extend the moving averages to the edges of the observed time window. We could, for instance, use a smaller window as we get near the edge of time. Alternatively, we could “pad” the raw data, adding data points at times before/after observations. What data values to add is a tricky problem! Sometimes one will pad a time series with the 1st data value at the beginning and the last data value at the end. Or, one can pad the time series with the mean data value. Perhaps most common (in some contexts) is to pad the time series with values of zero, which is sometimes OK but sometimes disastrous! (This is just me, but personally I hate this whole strategy — I never met a padding method I didn’t dislike).

When we computed our averages (either moving or not), we simply averaged the data in the given bin. We could substitute a “weighted average,” or, if some of the weights are negative, a “mean weighted value,” for the straight average. This is the “filter” idea we talked about earlier when discussing global fits of polynomial or Fourier series. The averages we’ve computed so far, giving equal weight to each data point within the averaging window, is nothing more nor less than the simplest example of such a filter, for which the weights are all equal within the averaging window but all zero outside of it. This is sometimes called a “boxcar filter.”

One obvious strategy is to give more weight to nearer (in time) data values by using larger weights in the center of our filter. A common choice is a “wedge filter,” which increases linearly from its left edge to the middle, then decreases linearly back to zero. We control the degree of smoothing by varying the width of the averaging window (the part of the filter which is non-zero). Another common choice is to use weights which follow the normal distribution, a case called “Gaussian smoothing” (since the normal distribution is also called a “Gaussian”). We can’t choose the width of the window any more because the window is infinite (Gaussian weights approach but don’t get to zero), but we can choose the width of the Gaussian itself as our control parameter for smoothness since Gaussians include a “scale parameter” (the standard deviation if it’s a probability density) which is ideal for adjusting the time scale.

Here, for instance, are the results of a 10-year wide boxcar filter (in blue), a 12-year wide tent filter (green), and a Gaussian smooth with scale parameter 3 years (red):

The three smooths are so close that it’s a bit hard to tell them apart on the graph, except at the edges. And that’s a good thing — it means that we’re getting consistent results from one method to another.

Notice that I extended the Gaussian smooth all the way to the edges of the time span. That’s not necessarily a good idea, and in this case they are not to be trusted. The reason is that a Gaussian smooth is still an “average” (weighted) over a “window”, and the data may show some trend over time so that earlier or later data are rising or falling. That’s no problem in the middle of the time span because at any given moment, the data which are earlier might be lower-than-average and the data which are later in time might be higher-than-average but their deviations will offset each other to cancel out and return a reliable whole-bin average. But at the very edge of the time span there is no data “later in time”, there’s only the “earlier in time” half with its below-average values to make our estimate come out below average. Like here:

The data are artificial, a straight line plus white noise, but the Gaussian smooth curves way too high at the start and too low at the end and the difference is substantial. I did use a wide Gaussian filter (scale parameter 10 years) to exaggerate the effect, but it’s very real and in some practical cases quite pronounced. Although this example has used a Gaussian filter, the “trend at the beginning or end” problem affects moving averages in general, not just Gaussian smooths. This is the principle reason that in some fields one usually does not extend a smoothing filter all the way to the edge, but restricts oneself to “complete” windows.

The bottom line is that “edge effects” in smoothing are hard to suppress, and in fact it’s impossible to eliminate them. But there are things that can help, both with global smooths (like polynomial/Fourier series) and with local smooths. Also, the filter idea is truly excellent, brilliant in fact, with lots of advantageous mathematical properties, until the time sampling is uneven, when things go horribly wrong.

All of which leaves us plenty to talk about in the next smooth post.

UPDATE UPDATE UPDATE

There’s a person who calls himself “Amego Sando” associated with a website called “Nouvelles et satellite scientifique” who has a very bad habit.

He takes MY blog posts, and reposts them, word-for-word, with all the pretty pictures. Here’s one. At the top of each he claims this:

“Written by Amego Sando”

That is a lie. Amego Sando is nothing but a “petit” thief.

I don’t mind if people re-post my stuff, but I insist on two things:

2. Credit the author (me, not Amego Sando)

I’ve place a comment on that site demanding my posts be removed. I very much doubt that it will get any response, or even ever be approved.

So here’s a question for the more savvy among you: If this organization and the thief Amego Sando don’t set this right — how do I go about forcing them to do so?

21 responses to “Smooth 2”

1. Typo: “And for those anal-retentive smoothers who insist on a function defined at all intermediate times, we can even get this result from a least-squares fit of “step functions” which are 10 days wide (or however wide our averaging interval is):”

Should be: “10 years wide”.

2. Phenomenal- as usual. Very much appreciated.

3. In the comparison of boxcar, tent and gaussian filters, the colours assigned in the text differs from those assigned in the legend of the graph.

[response: Fixed, thanks.]

Amego Sando does not restrict their plagiarism to you. For example, he also plagiarizes Nath Hermosa. They have also attempted to plagiarize Steven Goddard. (As an aside, the Goddard post’s title depends on a straightforward misinterpretation of what was said.)

If he has plagiarized a significant institution, advising them of that may result in that institutions legal resources being brought to bear, which would solve your problem fairly quickly. Alternatively, a direct complaint to word press may result in their forcing him to take down the offending material (which is all of Amego Sando’s material, apparently). Beyond that, legal action may be required.

4. Homonym: principle instead of principal.

5. Complain with the domain registrar if emails to the registrant fail.

Domain Name: segula-spatial.com
Registry Domain ID: 1802701015_DOMAIN_COM-VRSN

Registrant Name: Dennis Jennings
Registrant Organization: Shabaket Rawaq
Registrant Street: Lebanon
Registrant City: Leb
Registrant State/Province: Leb
Registrant Postal Code: 11300
Registrant Country: LB
Registrant Email: elak-host@hotmail.com

Registrar WHOIS Server: whois.melbourneit.com
Registrar URL: http://www.melbourneit.com.au
Updated Date: 2013-06-27T19:40:34Z
Creation Date: 2013-05-20T18:55:21Z
Registrar Registration Expiration Date: 2014-05-20T18:55:21Z
Registrar: Melbourne IT Ltd
Registrar IANA ID: 13
Registrar Abuse Contact Email: abuse@melbourneit.com.au
Registrar Abuse Contact Phone: +61.386242300

6. I think I have tracked down the hosting.

The website’s IP is 69.73.188.52, a reverse IP look-up shows it’s hosted on frio.nswebhost.com, together with about 100 other domains (search here). http://www.nswebhost.com does not resolve, http://nswebhost.com just gives you an Great Success !
Apache is working on your cPanel® and WHM™ Server
page.

WHOIS for nswebhost.com goes to a P.O box for the registrant:
Name: WEB HOST
Organization: NSWEBHOST.COM
Street: P.O. BOX 1108
City: FULSHEAR
State/Province: TX
Postal Code: 77441
Country: US
Phone: +1.8003590620
Fax: +1.2815339930
Email: DOMAIN@NSWEBHOST.COM

The phone number belongs to HostingZoom.com
with contact email greg@hostingzoom.com

They have an abuse department abuse@hostingzoom.com

The fax number belongs to
Landis Greg, 3034 Wellspring Lake Dr, Fulshear, TX 77441

Going by the e-mail address greg@hostingzoom.com above I assume the companies are related.

Landis Greg is CEO of Landis Holding

http://www.manta.com/c/mttf04l/landis-holdings-inc

which offers webhosting services via http://www.jaguarpc.com/ . That site has adminstration announcements for nswebhosts.com servers in its forum.

It’s been a bit of sleuthing, hope it helps.

7. It may be a site with copies of lots of material set up to harvest people following Google searches. Could be serving ads, could be serving malware. The registration information (addres and phone) aren’t complete.
Domain Name………. segula-spatial.com
Creation Date…….. 2013-05-21
Registration Date…. 2013-05-21
Expiry Date………. 2014-05-21
Tracking Number…… 1802701015_DOMAIN_COM-VRSN
Organisation Name…. Dennis Jennings

Tech Name………… Shabaket Rawaq
Tech Email……….. email
Tech Phone……….. +961.96600000000
Tech Fax………….
Name Server………. ns1-frio.nswebhost.com
Name Server………. ns2-frio.nswebhost.com

8. Bluegrue and Matthias did much better sleuthing; what I posted is only evidence that they are hiding their info from a straightforward search.

Google is always a mystery but sometimes they downrate sites for spoofing searches. THey have a “webmaster tools” complaint procedure, tho one never knows who reads the complaints or if they’ll do anything

9. RichardLH

It appears to be a rather badly configured web scraping site. Most of the comments look as though they are ‘bot’ generated. You probably increased his site traffic by a large margin just by url linking to it. I would strongly suggest that you remove the url and substitute something that only humans can read.

• The links are set as “nofollow”. Google does not use such links for page ranking, other search engines may use them nonetheless.

10. FredT

His oldest post can be found there : http://www.segula-spatial.com/page/288 (so far), or http://www.segula-spatial.com/hello-world/ (by user admin001). He’s being pasting text from rather random sources from the beginning (or he setup a script to do this, attract traffic and display adds)… There’s not even a word about the guy behind.

11. Thanks for another installment in the series, Tamino.

Good luck with the plagiarism thing. My summary review of Mike Mann’s “The Hockey Stick And The Climate Wars” was ‘scraped,’ too.

Like you, I posted a demand that it be taken down, which was utterly ignored. But since, as far as I could tell, the repost wasn’t drawing any attention, and my version was–once I got my host to repost the article, that is, since the original was automatically pulled when the host software detected the duplication, and it took human review to rectify!–I decided that further action was a pointless waste of my time.

12. Jonathan

Interesting blog series! I get the feeling that smoothing near the edges is similar to extrapolation, since it seems you need to invent data points that you don’t have.

[Response: This is nonsense. Padding time series is only one strategy (one which I never use, by the way). You don’t have to “invent data points” for smoothing to the edge or for extrapolation. Perhaps you got this idea from some denialist, the kind who will say anything, however ridiculous.]

• jyyh

Well, it might seem like that but there are no invented data points. Rather, you calculate how the data behaves in the section where there’s full data, and further use that knowledge to reduce error in the ends…

• Jonathan

Wow, that was a rather aggressive response to my reflection on a part of your post. Looking forward to your next post, where I hope you will talk more about smoothing and uncertainties/errors near the edges. I guess I won’t comment anymore here, though.

[Response: If being rebuked on an internet blog is too unsettling, then restraint is well-advised.]

13. If instead of a line you fitted a band where the width of the band gave a confidence interval, then I guess you’d find the band getting bigger at either end, reflecting the fact that we don’t know as much about the behaviour there.

Is there a standard method to do this?

14. Let Eli, a very old bunny, describe a method of fitting straight lines that his grandpa used. You find the average value of some data y(x) and the midpoint of x. You place a dot at (,). You then take a mirror and place it with the shiny surface on the dot and rotate the mirror until the reflection does not bend in the center. Draw a straight line along the mirror and then reverse the mirror and do it again. If the linear model is good you get the same line, otherwise you may need a higher order fitting function. Now think about why this is equivalent to much of what Tamino wrote.

15. John, this information is delivered by least square methods. For linear functions the band looks like a bow tie tilted at an angle.

16. Apropos site-scraping, any time there’s a big change that attracts interest and money, the scrapers get very busy. I just heard from an old friend who has spent going on 30 years developing food plants, building up a resource. A guy with money and time bought a batch, then went and set up a couple of institutes and published a book and filed for patents and is claiming it’s all his work and idea. And since he’s got the money and didn’t spend any money or lifetime actually doing the development, he’s way more able to profit from the material than the people who did it.

It’s good material. It was meant to be shared for the benefit of the world.

Taking work without credit and sucking the energy away from the people it came from — by having more money and time to do the PR — is a really profitable approach.

[Response: This case is even worse — one of putting your own name on it after the words “written by.” It is about as unethical as it gets, and is patently illegal.]

17. Dave

Thanks for the great work you are doing. Hopefully I can help with your problem. What you refer to as plagiarism is in fact copyright infringement. You should be able to solve the problem by sending a take down notice to the ISP. Send me a reply and I will help you solve the problem.

18. John Wiltshire

On the plagarism – find the web host (lots of info above) and issue a DMCA takedown request (assuming the host is in the US). It’s exactly what the law is supposed to be for.