In the last post we showed how Harold Brooks has applied a 1st-order Markov Chain model to the phenomenon of a significant tornado day (“STD”), in particular to explain the frequency of occurrence of long runs of consecutive STDs. An STD is defined as any day with at least one (possibly many more) tornados of strength F2 or greater (on the Fujita scale).
The 1st-order Markov model did a good job, whereas a bare-probability model doesn’t. In the bare-probability model, the probability that any given day is an STD can depend on the time of year (May is the peak time of year for tornado probability), but does not depend on whether previous days had significant tornados. However, there are too many long runs of consecutive STDs (as many as 9 in a row in the data used by Brooks, as many as 12 in a row in the NOAA-NWS data) for the bare-probability model to be correct.
In the 1st-order Markov model, the probability that today is an STD can depend, not only on the time of year, but also on whether or not yesterday was an STD. If it was, then the probability that today will be an STD is enhanced. So this model has two probabilities (both of which depend on the time of year): is the probability of an STD if yesterday was not, while is the probability of an STD if yesterday was. There are two probabilities (called “transition probabilities”) because there are two possible states for yesterday: either 0 (non-STD) or 1 (STD). The model is a 1st-order model because it depends on only 1 previous state (yesterday, but not days before that).
While the 1st-order does much better than the bare-probability model, the observed number of very long runs is still a bit more than the model indicates. Therefore I decided to explore another possibility — you may already have guessed that I looked at the 2nd-order Markov Chain model.
In the 2nd-order model, the probability today will be an STD depends not only on the time of year, but on whether or not the previous two days were STDs. There are four possible states for the previous two days: “00″ (neither was an STD), “01″ (two days ago was not but yesterday was), “10″ (two days ago was but yesterday was not), and “11″ (both yesterday and the day before that were STDs). This means there are four (time-of-year dependent) transition probabilities: , and , giving the probability today is an STD for each possible state of the preceding two days.
These probabilities certainly exist, whether the process follows a 2nd-order Markov Chain model or not! It’s worth taking note of the fact that if the process follows a 1st-order Markov Chain model, then the probability today is an STD doesn’t depend on the state two days ago. This would mean that the probabilities and must be the same, equal to of the 1st-order model, and also that the probabilities and are the same, equal to the probability of the 1st-order model. If we can show that these equivalences do not hold, then we have managed to disprove the 1st-order Markov Chain model — although that will not undermine its usefulness, nor will it prove the 2nd-order Markov (or any other) model.
I took the NOAA-NWS data and used it to estimate all four transition probabilities. Here’s the result:
Not only are there differences between and , not only are there differences between and , those differences are statistically significant. This effectively disproves the 1st-order Markov model (but as I said, doesn’t undermine its usefulness nor does it prove the 2nd-order model correct).
It’s quite interesting (and counterintuitive) that early in the tornado season (during March), is greater than . This means that if yesterday was an STD, then today is more likely to be an STD if two days ago was not than if it was. During the heart of tornado season, is greater than , so today is more likely to be an STD if both yesterday and the day before were, than if only yesterday was. Also, during most of the year (and almost all of the 2nd half of the year), the difference between and is not significant, which is what we would expect from the 1st-order Markov model.
Throughout the entire year, is greater than . This means that even if yesterday was not an STD, if two days ago was it still enhances the chance of an STD today. Therefore the conditions which bring about STDs have a persistence longer than a single day.
When we used the 1st-order model, we got the following comparison between observed and expected numbers of long runs of consecutive STDs (expected in black, observed in red):
Using the 2nd-order Markov model, we get a result which is only slightly different, but does have more chance of very long runs:
The discrepancy between observed and modeled numbers is less. In particular, with this model the probability of 3 “runs of 12″ in the 58-year NOAA-NWS record is about 1 out of 40, which is implausible but not unbelievable, so it’s significant evidence against the model but not very strong evidence.
And, there’s another factor which should also be considered. As Harold Brooks said in a comment,
There are a number of papers in both the formal and informal literature that changes in damage assessment over the years have led to a decrease in the reported intensity of the strongest tornadoes over the years.
Therefore it’s possible that the trio of runs-of-12 is in part due to the greater likelihood of earlier-in-the-record tornados being classified as F2 or stronger. After all, all three runs-of-12 are from 1967 or before. If tornados were ranked in those earlier records as they are today, we may not have seen so many long runs.