I had some further thoughts about the subject of the last post. Here they are.
I’m more convinced that if I have understood the procedure of Hansen et al. correctly, then the “spread” or “dispersion” of the distribution of temperature anomaly (whether re-scaled or not) will depend on the baseline period chosen. More importantly the relative values for different time spans (say, different decades) will also be baseline-dependent.
It seems to me that the variance for a given time span, say some decade, is the sum of two components: the spatial variance (basically, the different averages value between different regions) and the temporal variance, which includes both the trend and the fluctuations. When we talk about variability of the weather (not climate!) we’re trying to isolate those very fluctuations from both the time trend, and from the spatial variations. Hansen’s method, if I read it correctly, doesn’t remove the spatial variations but includes them.
Suppose we have temperature data for n times , for a set of k different regions/stations A = 1,2,…,k, for a total of N=kn data points. The data might be mean temperature for a single month, or for a single season (like the summer), or for the annual average, but we’ll assume there’s no annual cycle in the data. We can arrange the data into a matrix
temperature at time at station A.
The mean value at station A will be
We can then separate the data into the sum of station averages and local fluctuations
where the have the property that their station averages are zero
Temperature anomaly will be the difference between the temperature and the mean value during the baseline period at each particular station A.
where for convenience we defined the differences between the station averages and the anomaly offsets (which are the station averages during the baseline period) as
Now let’s compute the mean and the variance of the anomalies. The mean will be
since the sum of the terms is zero. The mean squared value of the anomalies will be
The middle term vanishes, again because the sum of the terms is zero. Hence the mean squared value is
Therefore the maximum-likelihood estimate of the variance is
The estimated variance of the data is thus the sum of the variance of the individual-station fluctuations , and the variance of the differences between station means and anomaly offsets .
It’s that last part which makes the variance baseline-dependent. In particular, if the baseline period is the same as the time span we’re averaging over, then all the station averages will equal their corresponding station offsets , and all the differences will be zero. This will cause the estimated variance to be minimum. If, on the other hand, the differences between station means and station offsets show large variance because different stations have warmed differently between baseline and observation intervals, then the last term will greatly increase the estimated data variance.
However, if we want to know whether or not the weather (not climate) is getting more variable, then we really want to isolate the individual-station fluctuations . Therefore I submit that in order to estimate the distribution of the temperature anomalies for estimating temperature variability during some time span (perhaps each decade, or a set of 11-year periods like Hansen et al.), then for each time span computed, then baseline for anomaly calculation should be equal to the time span being analyzed.