Aligning Tide Gauge Stations

I’ve spoken before about my new way of aligning tide gauge stations. Maybe it’s time for me to outline some of the details, and share my program (written in R) for doing the computation.


Warning: there’s math ahead. If you want to use the program but you really don’t want to read a lot of equations, skip ahead to “How to Run the Program” and have some fun.


Velocity-adjustment Weighted alignment

Tide gauge data certainly give us information about the height of the sea surface (SSH). But their data are relative to a completely arbitrary “zero point” so to combine data from different stations, we need to estimate how far they’re offset from each other (we’ll call it the “offset”). If we have N stations, there will be a separate offset for each. We’ll call them \mu_A, where the index A runs from 1 to N (the number of stations we’re combining).

The idea is that the data at time t_\alpha (where the index \alpha runs from 1 to the number of observation times) isn’t just some oceanic signal, it also includes the offsets, and there’s noise in the measurement to boot, so

x_{\alpha A} = \gamma_\alpha + \mu_A + \varepsilon_{\alpha A}.

Where x_{\alpha A} is the observation from station $A$ at time t_\alpha, \gamma_\alpha is the oceanic signal at time t_\alpha, \mu_A is the offset for station A, and \varepsilon_{\alpha A} is the noise value at time t_\alpha from statino A.

We can estimate the offsets by defining a quantity, the sum of squares of differences of each observation from the momentary “global” average

Q = \Sigma_{stations~A} ~ \Sigma_{times~t_\alpha} ~ ( x_{\alpha A} - \gamma_\alpha - \mu_A )^2 .

We then find the quantities which minimize this, which means finding not only the offsets \mu_A, but the “global average” \gamma_\alpha as well.

This is my old way of doing things, which I’ve sometimes called the “Berkeley method.” In my opinion it’s the best way to align many things, including temperature data from different stations.

But tide gauges are different. There’s another complication called vertical land movement (VLM). If the land rises or falls, the sea appears to fall or rise because tide gauges don’t measure absolute sea level, they measure relative sea level.

A saving grace about VLM is that at most stations, it goes at a steady pace. If at station A the land is falling at a rate \beta_A, then relative sea level will show an artificial rise at a rate of \beta_A. Our new model is that the data at a given time t_\alpha, at a given station A, is given by

x_{\alpha A} = \gamma_{\alpha} + \mu_A + \beta_A t_\alpha + \varepsilon_{\alpha A}.

Where x_{\alpha A} is the observation from station $A$ at time t_\alpha, \gamma_\alpha is the oceanic signal at time t_\alpha, \mu_A is the offset for station A, \beta_A is the negative of the rate of vertical land movement at station A, and \varepsilon_{\alpha A} is the noise value at time t_\alpha from statino A.

Now we minimize a quantity similar to the one before, but including the effect of VLM:

Q = \sum_A \sum_\alpha ( x_{\alpha A} - \gamma_\alpha - \mu_A - \beta_A t_\alpha )^2.

In addition to finding not only the offsets \mu_A and the “global average” \gamma_\alpha, we also find the VLM-induced rates $\beta_A$.

I have most recently introduced one additional complication. Some stations show a larger noise level than others, and I wanted to use a weighted mean for the global average, to take this into account. The new model requires the quantity to be minimized is

Q = \sum_A \sum_\alpha \frac{1}{\sigma_A^2} ( x_{\alpha A} - \gamma_\alpha - \mu_A - \beta_A t_\alpha )^2,

where \sigma_A is the noise level associated with station A.


How to Run the Program

vw.align is an R program to align records from different stations which do not all have observations at the same set of times. The basic inputs are:

tall — All the times for all the observations. There will be identical values
when multiple stations report at the same time.
xall — All the observations.
part — Station identifier for each observation. This can be a number or a name, as long as it’s unique for each individual station.

Optional paramters:

sloped — Should the alignment include a separate slope for each station?
Defaults to TRUE.
weighted — Should the computation use weighting (to downweight stations with
large variance)? Defaults to FALSE.
offres –Resolution in station offsets required for convergence. Defaults to 0.001.
Lower means more resolution and takes longer to compute; higher means less
resolution and computes faster.
full — Should it report input data and “adjusted” data for individual stations?
Defaults to FALSE.
tround — Rounding for input times, before matching times to identify when
stations are reporting the “same” time. Defaults to NULL.
xround — Rounding for output values. Defaults to NULL

The output is a list containing:

data — Results, including time, averaged value, its standard error,
and the number of stations reporting for each time.
offset — Offset for each individual station
slopes — Slope for each individual station (if used)
weights — Weights for each individual station (if used)

In addition, there are two further entries if the option “full” is set to “TRUE”

raw — Raw input data
adj — Station data corrected for offset and individual slope

Here is a tiny R script to run an example for you:

source(“vw_align.r”)

t1 = 0:999; x1 = rnorm(1000); p1 = rep(“A”,1000)
t2 = 100:899; x2 = rnorm(800)+.01*t2; p2 = rep(“B”,800)
t3 = 100:899; x3 = rnorm(800,,10)-.01*t3; p3 = rep(“C”,800)

plot(t1,x1,ylim=range(x1,x2,x3))
lines(t2,x2,col=”red”)
lines(t3,x3,col=”blue”)

tall = c(t1,t2,t3)
xall = c(x1,x2,x3)
part = c(p1,p2,p3)

q1 = vw.align(tall,xall,part,sloped=FALSE)
q2 = vw.align(tall,xall,part)
q3 = vw.align(tall,xall,part,weighted=TRUE,full=TRUE)

t = q1$d$t; x = q1$d$x; y = q2$d$x; w = q3$d$x

plot(t,x,type=”l”,col=”cornsilk4″)
lines(t,y,col=”red”)
lines(t,w,col=”blue”)


The Program

Wordpress kept telling me “this file type is not allowed for security reasons.” So here it is as inline text; copy and paste into your own file (which I call “vw_align.r”)

##################################
##################################
# vw.align
# combine station data
# t is time
# x is data
# part is station ID
# offres is resolution for offsets
# tround is rounding digits for t
# xround is rounding digits for x
#
# with options to use
# using variable slope by station
# and variable weight by station
##################################
##################################
vw.align = function(t,x,part,
sloped=TRUE,weighted=FALSE,full=FALSE,
tround=NULL,xround=4,offres=.001){
##########################
# make “part” a factor
##########################
part = as.factor(part)
nparts = length(levels(part))
if (nparts < 2){
zout = data.frame(t,x)
zout = zout[order(t),]
zout = list(data=zout,offsets=0)
} else{
#######################
# form giant data frame
#######################
for (jpart in 1:nparts){
tt = t[part == levels(part)[jpart]]
xx = x[part == levels(part)[jpart]]
ndx = order(tt)
xx = xx[ndx]
tt = tt[ndx]
if (length(tround) > 0){
tt = round(tt,tround)
}
z1 = data.frame(tt,xx)
names(z1) = c(“t”,levels(part)[jpart])
if (jpart == 1){
zz = z1
} else{
zz = merge(zz,z1,by=1,all=T)
}
}
###############################
# iteratively determine offsets
# and merged values
###############################
zdat = zz[,2:(nparts+1)]
zdat = as.matrix(zdat)
ntimes = dim(zz)[1]
tau = zz[,1]
tau = tau-mean(tau)
################
# initial values
################
mu = rep(0,nparts) # station A: zero point offset
beta = rep(0,nparts) # station A: slope offset
wate = rep(1,nparts) # station A: statistical weight
gam = rep(NA,ntimes) # time alpha: global mean
gam2 = gam # UNweighted mean when gam is weighted
dmu = 9999; dbeta = 9999
######################
# do until convergence
######################
while (dmu > offres){
for (jj in 1:ntimes){
xx = zdat[jj,]
xx = xx-mu-beta*tau[jj]
if (weighted){
gam[jj] = weighted.mean(xx,wate,na.rm=TRUE)
gam2[jj] = mean(xx,na.rm=TRUE)
} else{
gam[jj] = mean(xx,na.rm=TRUE)
}
}
oldmu = mu
if (sloped){
for (jj in 1:nparts){
xx = zdat[,jj]-gam
xxfit = lm(xx~tau)
mu[jj] = xxfit$coef[1]
beta[jj] = xxfit$coef[2]
}
} else{
for (jj in 1:nparts){
xx = zdat[,jj]-gam
mu[jj] = mean(xx,na.rm=TRUE)
}
}
mu = mu-median(mu)
beta = beta-median(beta)
#####################
# compute new weights
#####################
if (weighted){
for (jj in 1:nparts){
xx = zdat[,jj]-gam2-mu[jj]-beta[jj]*tau
wate[jj] = 1/var(xx,na.rm=TRUE)
}
}
#########################
# difference in mu values
#########################
dmu = mean(abs(mu-oldmu))
}
############
# final mean
############
num = rep(NA,ntimes)
se = rep(NA,ntimes)
std.dev = rep(NA,ntimes)
for (jj in 1:ntimes){
xx = zdat[jj,]
xx = xx-mu-beta*tau[jj]
if (weighted){
gam[jj] = weighted.mean(xx,wate,na.rm=TRUE)
} else{
gam[jj] = mean(xx,na.rm=TRUE)
}
xx = xx[is.finite(xx)]
num[jj] = length(xx)
std.dev[jj] = sd(xx)
}
se = std.dev/sqrt(num)
zout = data.frame(x=gam,num,se,std.dev)
if (length(xround) > 0){
zout = round(zout,xround)
}
zout = data.frame(t=zz[,1],zout)
if (full){
zd = data.frame(zdat)
names(zd) = levels(part)
zd2 = zd
for (jj in 1:nparts){
zd2[,jj] = zd2[,jj]-mu[jj]-beta[jj]*tau
}
zout = list(data=zout,offsets=mu,slopes=beta,weights=wate,
raw=zd,adj=zd2)
} else{
zout = list(data=zout,offsets=mu,slopes=beta,weights=wate)
}
}
zout
}

Final note: If you’d like to make a donation to this blog, this would be a particularly good time to follow the link below.


This blog is made possible by readers like you; join others by donating at My Wee Dragon.


Advertisement

21 responses to “Aligning Tide Gauge Stations

  1. Question, and I haven’t gone through the code to see …

    Do you do anything in the resulting expansion to keep, for instance, the \gamma_{\alpha}^{2} and \beta_{A}^{2}t_{\alpha}^2 terms identifiable? In other words, it seems when minimizing Q there could be a path along the surface, given constant A, where \gamma_{\alpha}^{2} and \beta_{A}^{2}t_{\alpha}^2 trade off against one another keeping Q constant. There are other cross terms where the same might be true.

    In other applications, identifiability is often handled by a change of variables where, using the notation here, reference \gamma_{\alpha_{0}} is chosen and then others are re-expressed as \gamma_{\alpha} = \Delta \gamma_{\alpha}  + \gamma_{\alpha_{0}}, but I’m not sure how that would work here?

    Are there separate constraints on these? Priors perhaps?

    I probably should just sit down and study the code, but if this kind of thing is in there it would be good to have some explanation in the text.

    Just my two cents.

    Thanks.

    [Response: I should have provided more details. There are ambiguities, because adding a constant to all the offsets while subtracting it from the global signal, gives the Q value unchanged. Also, adding a constant to all the slopes leaves Q unchanged.

    Fortunately one gets around that by imposing simple constraints. I originally constrained the constants (offset and slope) to be zero for the first station, but now I constrain the median offset and median slope to be zero.]

  2. Owch. Is it possible to do this in a way that doesn’t toast all the indents?

    [Response: Not easily on wordpress. Let me see what I can come up with.]

  3. Tamino

    Isn’t it possible to load txt files into e.g. Libre Office Writer, from which you may obtain a pdf file, and finally to upload the latter via Google Drive or similar?

    I know it’s ugly but it works, like here:
    https://drive.google.com/file/d/1kLZZQH-zQjkWMvwT72JxAM34cLPOUjMr/view

  4. The code could be tossed into a Github as it popular these days. I just put it into a Google Drive directory … Google even gives you some version control.

  5. It would be very interesting to obtain a sorted trend chart for PMSL stations like this one

    https://drive.google.com/file/d/1MKN4zBzgKIwM9BPBOS6M7Vwoqzm7ahSi/view

    but then including all corrections for vertical land movement.

    The chart above compares, for the 679 (of 1513) PMSL stations having presented sufficient data to compute anomalies for the reference period 1993-2013, the trends for (1) their entire lifetime and for (2) the reference period.

    Trends below -10 mm/yr and above +10 mm/yr were excluded in this chart (this was of course an arbitrary decision).

    The life time average trend was 2.1 mm/yr regardless of trend exclusion; the average trend for the reference period was 3.1 mm/yr without, and 2.7 mm/yr with trend exclusion.