# Aligning Tide Gauge Stations

I’ve spoken before about my new way of aligning tide gauge stations. Maybe it’s time for me to outline some of the details, and share my program (written in R) for doing the computation.

Warning: there’s math ahead. If you want to use the program but you really don’t want to read a lot of equations, skip ahead to “How to Run the Program” and have some fun.

Tide gauge data certainly give us information about the height of the sea surface (SSH). But their data are relative to a completely arbitrary “zero point” so to combine data from different stations, we need to estimate how far they’re offset from each other (we’ll call it the “offset”). If we have $N$ stations, there will be a separate offset for each. We’ll call them $\mu_A$, where the index $A$ runs from 1 to $N$ (the number of stations we’re combining).

The idea is that the data at time $t_\alpha$ (where the index $\alpha$ runs from 1 to the number of observation times) isn’t just some oceanic signal, it also includes the offsets, and there’s noise in the measurement to boot, so

$x_{\alpha A} = \gamma_\alpha + \mu_A + \varepsilon_{\alpha A}$.

Where $x_{\alpha A}$ is the observation from station $A$ at time $t_\alpha$, $\gamma_\alpha$ is the oceanic signal at time $t_\alpha$, $\mu_A$ is the offset for station A, and $\varepsilon_{\alpha A}$ is the noise value at time $t_\alpha$ from statino $A$.

We can estimate the offsets by defining a quantity, the sum of squares of differences of each observation from the momentary “global” average

$Q = \Sigma_{stations~A} ~ \Sigma_{times~t_\alpha} ~ ( x_{\alpha A} - \gamma_\alpha - \mu_A )^2$.

We then find the quantities which minimize this, which means finding not only the offsets $\mu_A$, but the “global average” $\gamma_\alpha$ as well.

This is my old way of doing things, which I’ve sometimes called the “Berkeley method.” In my opinion it’s the best way to align many things, including temperature data from different stations.

But tide gauges are different. There’s another complication called vertical land movement (VLM). If the land rises or falls, the sea appears to fall or rise because tide gauges don’t measure absolute sea level, they measure relative sea level.

A saving grace about VLM is that at most stations, it goes at a steady pace. If at station $A$ the land is falling at a rate $\beta_A$, then relative sea level will show an artificial rise at a rate of $\beta_A$. Our new model is that the data at a given time $t_\alpha$, at a given station $A$, is given by

$x_{\alpha A} = \gamma_{\alpha} + \mu_A + \beta_A t_\alpha + \varepsilon_{\alpha A}$.

Where $x_{\alpha A}$ is the observation from station $A$ at time $t_\alpha$, $\gamma_\alpha$ is the oceanic signal at time $t_\alpha$, $\mu_A$ is the offset for station $A$, $\beta_A$ is the negative of the rate of vertical land movement at station $A$, and $\varepsilon_{\alpha A}$ is the noise value at time $t_\alpha$ from statino $A$.

Now we minimize a quantity similar to the one before, but including the effect of VLM:

$Q = \sum_A \sum_\alpha ( x_{\alpha A} - \gamma_\alpha - \mu_A - \beta_A t_\alpha )^2$.

In addition to finding not only the offsets $\mu_A$ and the “global average” $\gamma_\alpha$, we also find the VLM-induced rates $\beta_A$.

I have most recently introduced one additional complication. Some stations show a larger noise level than others, and I wanted to use a weighted mean for the global average, to take this into account. The new model requires the quantity to be minimized is

$Q = \sum_A \sum_\alpha \frac{1}{\sigma_A^2} ( x_{\alpha A} - \gamma_\alpha - \mu_A - \beta_A t_\alpha )^2$,

where $\sigma_A$ is the noise level associated with station $A$.

How to Run the Program

vw.align is an R program to align records from different stations which do not all have observations at the same set of times. The basic inputs are:

tall — All the times for all the observations. There will be identical values
when multiple stations report at the same time.
xall — All the observations.
part — Station identifier for each observation. This can be a number or a name, as long as it’s unique for each individual station.

Optional paramters:

sloped — Should the alignment include a separate slope for each station?
Defaults to TRUE.
weighted — Should the computation use weighting (to downweight stations with
large variance)? Defaults to FALSE.
offres –Resolution in station offsets required for convergence. Defaults to 0.001.
Lower means more resolution and takes longer to compute; higher means less
resolution and computes faster.
full — Should it report input data and “adjusted” data for individual stations?
Defaults to FALSE.
tround — Rounding for input times, before matching times to identify when
stations are reporting the “same” time. Defaults to NULL.
xround — Rounding for output values. Defaults to NULL

The output is a list containing:

data — Results, including time, averaged value, its standard error,
and the number of stations reporting for each time.
offset — Offset for each individual station
slopes — Slope for each individual station (if used)
weights — Weights for each individual station (if used)

In addition, there are two further entries if the option “full” is set to “TRUE”

raw — Raw input data
adj — Station data corrected for offset and individual slope

Here is a tiny R script to run an example for you:

source(“vw_align.r”)

t1 = 0:999; x1 = rnorm(1000); p1 = rep(“A”,1000)
t2 = 100:899; x2 = rnorm(800)+.01*t2; p2 = rep(“B”,800)
t3 = 100:899; x3 = rnorm(800,,10)-.01*t3; p3 = rep(“C”,800)

plot(t1,x1,ylim=range(x1,x2,x3))
lines(t2,x2,col=”red”)
lines(t3,x3,col=”blue”)

tall = c(t1,t2,t3)
xall = c(x1,x2,x3)
part = c(p1,p2,p3)

q1 = vw.align(tall,xall,part,sloped=FALSE)
q2 = vw.align(tall,xall,part)
q3 = vw.align(tall,xall,part,weighted=TRUE,full=TRUE)

t = q1$d$t; x = q1$d$x; y = q2$d$x; w = q3$d$x

plot(t,x,type=”l”,col=”cornsilk4″)
lines(t,y,col=”red”)
lines(t,w,col=”blue”)

The Program

Wordpress kept telling me “this file type is not allowed for security reasons.” So here it is as inline text; copy and paste into your own file (which I call “vw_align.r”)

##################################
##################################
# vw.align
# combine station data
# t is time
# x is data
# part is station ID
# offres is resolution for offsets
# tround is rounding digits for t
# xround is rounding digits for x
#
# with options to use
# using variable slope by station
# and variable weight by station
##################################
##################################
vw.align = function(t,x,part,
sloped=TRUE,weighted=FALSE,full=FALSE,
tround=NULL,xround=4,offres=.001){
##########################
# make “part” a factor
##########################
part = as.factor(part)
nparts = length(levels(part))
if (nparts < 2){
zout = data.frame(t,x)
zout = zout[order(t),]
zout = list(data=zout,offsets=0)
} else{
#######################
# form giant data frame
#######################
for (jpart in 1:nparts){
tt = t[part == levels(part)[jpart]]
xx = x[part == levels(part)[jpart]]
ndx = order(tt)
xx = xx[ndx]
tt = tt[ndx]
if (length(tround) > 0){
tt = round(tt,tround)
}
z1 = data.frame(tt,xx)
names(z1) = c(“t”,levels(part)[jpart])
if (jpart == 1){
zz = z1
} else{
zz = merge(zz,z1,by=1,all=T)
}
}
###############################
# iteratively determine offsets
# and merged values
###############################
zdat = zz[,2:(nparts+1)]
zdat = as.matrix(zdat)
ntimes = dim(zz)[1]
tau = zz[,1]
tau = tau-mean(tau)
################
# initial values
################
mu = rep(0,nparts) # station A: zero point offset
beta = rep(0,nparts) # station A: slope offset
wate = rep(1,nparts) # station A: statistical weight
gam = rep(NA,ntimes) # time alpha: global mean
gam2 = gam # UNweighted mean when gam is weighted
dmu = 9999; dbeta = 9999
######################
# do until convergence
######################
while (dmu > offres){
for (jj in 1:ntimes){
xx = zdat[jj,]
xx = xx-mu-beta*tau[jj]
if (weighted){
gam[jj] = weighted.mean(xx,wate,na.rm=TRUE)
gam2[jj] = mean(xx,na.rm=TRUE)
} else{
gam[jj] = mean(xx,na.rm=TRUE)
}
}
oldmu = mu
if (sloped){
for (jj in 1:nparts){
xx = zdat[,jj]-gam
xxfit = lm(xx~tau)
mu[jj] = xxfit$coef[1] beta[jj] = xxfit$coef[2]
}
} else{
for (jj in 1:nparts){
xx = zdat[,jj]-gam
mu[jj] = mean(xx,na.rm=TRUE)
}
}
mu = mu-median(mu)
beta = beta-median(beta)
#####################
# compute new weights
#####################
if (weighted){
for (jj in 1:nparts){
xx = zdat[,jj]-gam2-mu[jj]-beta[jj]*tau
wate[jj] = 1/var(xx,na.rm=TRUE)
}
}
#########################
# difference in mu values
#########################
dmu = mean(abs(mu-oldmu))
}
############
# final mean
############
num = rep(NA,ntimes)
se = rep(NA,ntimes)
std.dev = rep(NA,ntimes)
for (jj in 1:ntimes){
xx = zdat[jj,]
xx = xx-mu-beta*tau[jj]
if (weighted){
gam[jj] = weighted.mean(xx,wate,na.rm=TRUE)
} else{
gam[jj] = mean(xx,na.rm=TRUE)
}
xx = xx[is.finite(xx)]
num[jj] = length(xx)
std.dev[jj] = sd(xx)
}
se = std.dev/sqrt(num)
zout = data.frame(x=gam,num,se,std.dev)
if (length(xround) > 0){
zout = round(zout,xround)
}
zout = data.frame(t=zz[,1],zout)
if (full){
zd = data.frame(zdat)
names(zd) = levels(part)
zd2 = zd
for (jj in 1:nparts){
zd2[,jj] = zd2[,jj]-mu[jj]-beta[jj]*tau
}
zout = list(data=zout,offsets=mu,slopes=beta,weights=wate,
} else{
zout = list(data=zout,offsets=mu,slopes=beta,weights=wate)
}
}
zout
}

Final note: If you’d like to make a donation to this blog, this would be a particularly good time to follow the link below.

This blog is made possible by readers like you; join others by donating at My Wee Dragon.

### 21 responses to “Aligning Tide Gauge Stations”

1. Question, and I haven’t gone through the code to see …

Do you do anything in the resulting expansion to keep, for instance, the $\gamma_{\alpha}^{2}$ and $\beta_{A}^{2}t_{\alpha}^2$ terms identifiable? In other words, it seems when minimizing $Q$ there could be a path along the surface, given constant $A$, where $\gamma_{\alpha}^{2}$ and $\beta_{A}^{2}t_{\alpha}^2$ trade off against one another keeping $Q$ constant. There are other cross terms where the same might be true.

In other applications, identifiability is often handled by a change of variables where, using the notation here, reference $\gamma_{\alpha_{0}}$ is chosen and then others are re-expressed as $\gamma_{\alpha} = \Delta \gamma_{\alpha} + \gamma_{\alpha_{0}}$, but I’m not sure how that would work here?

Are there separate constraints on these? Priors perhaps?

I probably should just sit down and study the code, but if this kind of thing is in there it would be good to have some explanation in the text.

Just my two cents.

Thanks.

[Response: I should have provided more details. There are ambiguities, because adding a constant to all the offsets while subtracting it from the global signal, gives the Q value unchanged. Also, adding a constant to all the slopes leaves Q unchanged.

Fortunately one gets around that by imposing simple constraints. I originally constrained the constants (offset and slope) to be zero for the first station, but now I constrain the median offset and median slope to be zero.]

2. Owch. Is it possible to do this in a way that doesn’t toast all the indents?

[Response: Not easily on wordpress. Let me see what I can come up with.]

3. Tamino

Isn’t it possible to load txt files into e.g. Libre Office Writer, from which you may obtain a pdf file, and finally to upload the latter via Google Drive or similar?

I know it’s ugly but it works, like here:

4. AJ

Yet another reconstruction. This one employing kriging:

• AJ

Could you please upload the list of those PMSL stations which survived all your successive filtering operations (in a form allowing me to quickly extract them out of PMSL’s station file (their id is enough for that)? I’m highly interested in it.

I start from the assumption that you too use only RLR records.

• AJ

With regard to filtering, yes it’s always a question if you’re filtering noise or information. More high quality data would be nice. Most stations were filtered due to lack of GPS. I also tried including stations with a low seismic hazard, without VLM adjustment, but this led to a significant variogram nugget.

• AJ [2]

My reason to ask for your final PMSL station subset is the amazing difference between your gauge selection:

and the available set of all stations giving valuable data even if for short periods:

It is clearly visible that your successive flitering steps eliminate a great part of the Globe’s coastal corners, what might create a far bigger flaw than the sum of those you tried to get rid of.

• AJ

There is a link to source code and intermediary files provided on the page
The file rlr-obs-df.rds is a data frame which contains psmsl station id.

• AJ

Thanks, that I had imagined already, but… not everybody on Earth uses R :-(.
I’m a good old C-imple-minded programmer.

• AJ [4]

I forgot above all to mention that the file contents of the directory

and those of its subdirectory ‘source’ are not accessible.

• AJ

Crap. I’ll fix that next week when I have access to my laptop.

• AJ

Bindidon… I updated the link and tried in incognito mode. I got a network error message, but was still able to download. Not sure the reason for the difficulties. Thought for sure I had tested it previously.

• AJ [5]

Sorry, still no change on my side. Only the ‘source’ directory is accessible within ‘slr-kriged.zip’, but no file contents are either.

• AJ [end]

“.rds files can’t be previewed.”
I give it up: it’s too boring to help you understanding even simplest things.

• @bindidon,

You can be as intransigent about SLR and climate as you want, but I draw the line at insulting R‘s .rds files!

• AJ

bindidon [out]

• ecoquant

“… but I draw the line at insulting R‘s .rds files!”

Wow. Sir, I propose that you read the sequence a bit more carefully before writing such unnecessary kauderwelsch.

Why the heck should I little layman insult a worldwide known programming language or one of its technical components?

• @Bindidon,

Wow. Sir, I propose that you read the sequence a bit more carefully before writing such unnecessary kauderwelsch.

Because I was making a joke.

5. The code could be tossed into a Github as it popular these days. I just put it into a Google Drive directory … Google even gives you some version control.

6. It would be very interesting to obtain a sorted trend chart for PMSL stations like this one