I haven’t posted much lately because I’ve been hard at work on my new book. It’s titled *Understanding Statistics*, and I expect to finish in a week or two. I’ll be sure to post here when I do, hoping that lots of you will buy it. Even if you don’t need one for yourself, you might know somebody who would enjoy and make good use of it. Who knows, maybe 20 of you will send a copy to Anthony Watts. Maybe he would learn something from it. Irony of the richest kind.

It’s written at an introductory college level for non-math majors, those who haven’t studied calculus. There are some notable differences from the usual such text. For one thing, there’s much more emphasis on *theory* than is usual. Too many texts amount to little more than a cookbook, with *recipes* for statistical procedures and gobs of examples but little or no exposition of *why* it works the way it does, not just *how*.

That’s fine, students learn well that way and usually do well on their tests — they even *feel good* about it. But five years later when they actually need to *use* it, all is forgotten. So they have to go back to the book and start over *from scratch*. My experience is that when you learn the “why,” it sticks. The “how” will be forgotten in five years, but if the need arises, you won’t have to re-learn it from square one. Something about understanding the “why” reaches the core of your brain and makes the re-learning so much easier and faster, and so much less likely to go wrong. Truly *understanding* something stays with you forever.

Often, theory is omitted because it’s considered too difficult for non-math majors. That’s an idea I find both unflattering, and mistaken. My experience is that students, even those who hate and fear math, have more than enough intellect to get it — really understand — if it’s presented clearly enough. That’s the writer’s job and the teacher’s job, so when students find a smattering of theory too difficult, I don’t blame them — I blame the teacher.

Another difference is that I’ve included some “case studies.” These aren’t just data sets used to illustrate a particular method. They’re data sets which I analyze the shit out of. That’s the way statistics often is, and often should be, done. It’s true that there are many circumstances in which data are collected to anwer a specific question and you know ahead of time exactly how they’ll be studied and what tests will be applied. But there are surprisingly many situations in which data are acquired and nobody has a clue what they mean. That’s a case in which the “cookbook” approach falls on its face. By showing how to *explore* data and put it under the microscope from multiple angles, I hope to give readers much more power over the data they wrestle with.

Yet another difference is that I’ve left out problem sets and computer instructions. Those will be put into a *study guide* which will be released in a few months so that teachers who wish to use it for a formal course can do so. By separating these functions I hope to make the base text more readable, and I’m a firm believer that books should be written so as to be *read*. The study guide will have to change rapidly to keep up with changes in how data are acquired and made available and how computer tools evolve. I hope that the base text will be considerably more “timeless.”

I probably won’t post much in the upcoming week. But when the new year arrives I’ll be back with a vengeance. Stay tuned.

The book sounds great, the “cookbook” approach to statistics generally leads to bad statistics, so an introductory book that deals with the theory is exactly what is required. The “no warming since [insert date]” is a great example of what is wrong with the “cookbook” approach to statistics. The skeptics are just following the “null ritual” and don’t understand that the null hypothesis ought to be the hypothesis to be nullified in order to provide support for their hypothesis. As their hypothesis is that there has been a change in the rate of warming, their H0 should be that the warming has continued at the same rate. However they don’t understand the purpose and meaning of the hypothesis test, so they don’t understand why *their* H0 shouldn’t be that the trend is flat (as that is essentially starting off from the assumption that they are right until “proven” wrong, which is hardly skepticicism). Keep up the good work!

I really like the sound of the book, I totally agree about explaining the reasons why a procedure works. I am useless at remembering separate facts, but why things work just sticks for me too. For more complex problems I find complex examples very useful, it sounds like your case studies fits my style of learning exactly.

Hope you had a happy Christmas and merry new Year.

> will send a copy to Anthony Watts.

I know one Canadian who could actually benefit… especially what you said

“Too many texts amount to little more than a cookbook, with recipes for statistical procedures and gobs of examples but little or no exposition of why it works the way it does, not just how”rang a bell. I think he still hasn’t admitted — or even understood — that his insistence on using Pearson’s r^2 as a meaningful statistic for proxy-reconstruction goodness was wrong-headed. McI

knowslots of statistics, butunderstandslots less.‘ll look forward to this with eager anticipation, for many lending libraries outside of academia, here in the UK, have stripped out most of the really useful texts and not just on stat’s. I last studied stat’s over twenty years ago and even them many of the techniques that you have used along the way here over the last few years were never touched upon.

Your approach based on understanding processes rather than rote learning of techniques is to be applauded, this is another reason for my anticipation.

As for calculus, I experienced courses spaced twenty years apart. After the first I understood what was going on but after the second I was left baffled somewhat but saved by much extra work in what little spare time I had on the degree course.

Looking forward to more erudite and illuminating posts in the new year.

Will it be available in ebook form?

Seconded.

Thirded.

Will it be DRM free?

Fourthed. And the best for 2013, we’re entering the teen years. Expecting big changes :D

Looking forward to it.

However, I wonder what you mean by “theory”. It seems to me there are two possibilities: the detailed derivations of the equations used and the logic behind the use of various methods.

The statistics part of my computer science course (at Imperial College, London in the late 1970s) spent most of its time ploughing through the derivations without really explaining the logic leaving me with quite basic questions like “doesn’t likelihood depend on your assumptions about the form of the fit?” – questions which just got puzzled stares from fellow students. I finished up with a reasonably decent degree but think I probably only scraped through on a couple of the maths courses taught in this style (statistics and linear algebra) which was just rote learning with no real understanding, to which I’m not suited at all.

I was all hopeful about your book until I read: “Often, theory is omitted because it’s considered too difficult for non-math majors.” All the derivation stuff could, indeed, be considered too difficult but the more logical/philosophical aspects surely not.

[

Response:There's not much in the way of derivations, although there's more than is usual. The emphasis really is on *why*, or as you say, the logic of it.]Excellent. Don’t forget to tell us when it’s published ;-)

Ed,

I would disagree. Statistics and probability are one of the areas of math where the logic and philosophy are least well understood. Kolmogorov’s measure theory provides a reasonable rationale for the frequentist approach, but falls short on some very basic concepts (including the definition of “random”). Bayesian approaches are reasonably well worked out now–and indeed, most of the probabilities we work with in daily life are subjective. However, the seemingly ad hoc nature of the foundations of Bayesian probability are unsatisfying. That is part of the reason why the course you took was taught in the way it was. Too often the subject is taught as though reading recipes from a cookbook. It could be taught more with an eye to the “whys”, but such a class would be conceptually demanding.

Indeed. However it still needs to be tackled at some level and some sort of conceptual framework (even if it’s just “it’s trickier than it looks”) is necessary. A statistics course which spends half a dozen lectures on derivations of the central limit theorem and least squares estimation but doesn’t mention Bayes and doesn’t mention that OLS error estimates are messed up by autocorrelation¹ is badly skewed.

¹ Actually, I think my course might have gone into this but I was already lost in the details by that point.

However, the seemingly ad hoc nature of the foundations of Bayesian probability [is] unsatisfying.Yeah, it’s as if you just have to have a knack for it. I hope Tamino makes it all clear even though no one else has.

http://plato.stanford.edu/entries/logic-inductive/

explains what Bayesian reasoning is about. I don’t find the subject poorly presented.

There is also E.T. Jaynes’s “Probability Theory: The Logic of Science”. Reviewers find this book to be the ‘best’ introduction to Bayesian analysis.

You know, that sounds like exactly the kind of statistics book I need on my bookshelf. Most of the time, I’m less interested in the mechanics of how it’s done, but more why it’s done, and what does it mean to do it that way.

Heaven knows I could have used a stats text like that when I studied it at university. A single stats subject that compromised less than 1% of the credit requirements for my engineering degree, was taught by a lecturer who could put a caffeine addict to sleep, only required a bare pass, and was never touched again in the rest of my studies. Oh, how I could have used a book like that…

Sounds great! I’ll definitely be picking one up.

“Who knows, maybe 20 of you will send a copy to Anthony Watts. Maybe he would learn something from it. Irony of the richest kind.

It’s written at an introductory college level for non-math majors, those who haven’t studied calculus.”

Given that Watts apparently doesn’t even understand basic algebra (the various “changing baseline changes trend” fiascos), unfortunately it is still likely aimed at far too high a level for him.

Are you publishing it under your real name or under a pseudonym? Publishing under ‘Tamino’ would be an awesome signal to academia.

[

Response:William Sealy Gosset already did that when he published the "t test" under the pseudonym "Student."]I love the sound of this – it’s already on my book list even before you’ve finished it. I already feel I’ve learnt a lot from your posts here and they certainly demonstrate you can communicate ideas and concepts clearly. If it lives up to expectations I may well be recommending my department buys a set!

That sounds very cool.

If it becomes available on Amazon I will definitely being getting a copy.

Looking forward to the “back with a vengeance” part. Happy New Year!

I look forward to it!

Yay Hooray! Will be adding it to my science library. Thanks for all your hard work on this blog and now with the book. You’re a treasure!

Most statistics texts work from the premise that the student is trying to design an experiment. As you are aware, many of the useful data in the earth sciences are ‘found’ data–collected for other purposes. It has been my experience that the ‘design’ approach is very confusing when trying to understand earth science data because of this difference in emphasis. I applaud your approach to show how to use statistics to explore these earth science data sets.

Tamino, do you mean that the

Cartoon Guide to Statisticsisn’t enough? I’ll be glad to get the book provided it isn’t above my pay grade.Please give lots of code examples in R in your book. Statistics today is now almost a branch of computer science(or vice versa)

[

Response:They'll be given in the accompanying study guide. I want the book itself still to be relevant 100 years from now.]No, computational statistics, physics, chemistry, biology, etc. are not branches of computer science.

Neither is investment banking. :-)

Hi David

As a statistics major, I now spend all day writing code to transform data and getting R to parallel process(not an easy task).

While all the CompSci majors spend their days working on Microsoft Project :)

computational statistics is a point where branches of computer science and statistics have grafted onto eachother due to prolonged close proximity; you need a solid basis in *both* branches to succeed. .

Anon & dikranmarsupial — Just because one uses a subject doesn’t mean one is one. For example, we all use English but none of us here is a professional at it.

I am essentially a computer scientist, I work in a computer science department, my research is in machine learning, which is a branch of statistics in which computer science features very promenently. My research has elements of both computer science and statistics (and engineering), it isn’t purely one thing, nor the other, but a combination of them both. There is no real point in snobbery or compartmentalising, most advances in science are due to novel combinations of ideas, so the interfaces between fields generally are the most productive areas to work in.

Hi David

I see you are a CS professor. CS is another branch of mathematics (applied logic). So aren’t we both really just applied mathematicians?

Anon — I’m retired now giving me the time to explore other (actual) sciences. I think ‘computer science’ is misnamed. ‘Algorithmal engineering’ would be more descriptive. Algorithmal? Well the fellow next door isn’t an ‘electric engineer’, is he?

dikranmarsupial — I would say that machine learning is a part of algorithmal engineering which applies certain mathematically stated concepts, often thought of as part of statistics to devise algorithms which can be said to ‘learn’.

Anon — Some algorithmal engineers apply mathematics, all are expected to apply logic. I suppose ‘applied logician’ is easier on the ear than ‘algorithmal engineer’.

These days, much of the state of the art in machine learning is almost indistinguishable from Bayesian statistics. The idea of “learning” has become more or less just a relic of its original history in neural networks. It has since pretty much moved away from any real engagement with biological relevance, and little to do with AI, so it is hard to view the model as having “learned” about the problem any more than a traditional linear regression model has. Bayesian statistics define the integrals you need to evaluate for a particular problem, but working out how to approximate those integrals in a feasible manner is usually the tricky part for most non-trivial problems, and that is a computational issue which has little or nothing to do with statistics. The software (such as BUGS) has not yet reached the point where users don’t need to have an understanding of the computational issues.

“Computer Science is no more about computers than astronomy is about telescopes” E. Dyksra (perhaps)

Why not Python?

Will the R code call canned routines? If not, why not readable Python?

Hi Pete

Most statisticians now use R for their work.

Python is a great language, but it was invented only so people could stop using Perl :)

All we need now is a language to be invented so people can stop using R ;-)

[

Response:I like R.]I find R a bit difficult to remember because the syntax is too ideosyncratic, but the vast library of packages make it invaluable (and hard to replace). It is an excellent tool for most working statisticians though.

“I like R”

— Horatio’s versification of Tamino & The Rsupial

I like R, but I don’t love R

She’s R’d, as you’ll discover

A little idiosyncRactic

When it comes to mathematic

There shoulda been a cap in the “discoveR”

Apologies.

Off topic, I suppose.

Some sea level rise consequences from Fairfax Climate Watch:

http://climatewatch.typepad.com/blog/2012/12/estimated-future-ice-loss-rates-updated-dec-2012.html

which uses conservative (in the engineering sense) estimates of future SLR.

The growth of sea level is actually not exponential but rather sigmoid, i.e., S-shaped. Otherwise, the article is well done.

Tamino, sorry for OT, but, happy new year and new book:

Do you plan to write a book, or can you recommend a book, in which the “whys” of statistics are explained more mathematically?

[

Response:In many cases the "why" is explained mathematically. But to stay within the level of non-calculus non-math major, there are limits to how far I can go.]For the subset of Bayesian statistics, I can recommend Jaynes:

http://www.cambridge.org/gb/knowledge/isbn/item1155795/?site_locale=en_GB

Thanks

Horatio wrote this with Tamino in mind a while back.

Good luck with the book.

Should be a very good one (and badly needed)

“I love stats”

–by Horatio Algeranon

(with apologies to Jeffrey Moss — “I love trash” — and Oscar, of course)

Oh, I love stats

Anything binomial or Bernoullian or Gaussian

Anything Fuzzy or Frequentist or Bayesian

Yes, I love stats

I have here a plot that is scattered and abstruse

It’s chock full of data with noise that is puce(!)

A gift from denialists who are quite obtuse

I love it because it’s stats!

Oh, I love stats

Anything binomial or Bernoullian or Gaussian

Anything Fuzzy or Frequentist or Bayesian

Yes, I love stats

I’ve an E&E paper that’s hot off the press

It’s all full of holes and it sure is a mess

While most wouldn’t read it, unless under duress

I’ll debunk it with glorious stats!

Oh, I love stats

Anything binomial or Bernoullian or Gaussian

Anything Fuzzy or Frequentist or Bayesian

Yes, I love stats

I’ve an HP-25 and an old Macintosh

A (free!) “R” stats package that works great (by gosh)

The HP quit working when she went through the wash

But I keep her because she did stats!

Anything binomial or Bernoullian or Gaussian

Anything Fuzzy or Frequentist or Bayesian

Yes, I love stats

Yes, I love, I love, I love stats.

*Note: “Puce Noise” is also known as “Brown noise”* or “red noise” (ie, Brownian noise). “Puce” is a brownish-red hue, of course. (Not to be confused with “brown-nose”, like Horatio.)

I look forward to your book, but I hope you have read Taleb’s Fooled By Randomness. It’s about numbers and probability, and a whole lot else.

I see that there are a couple of other books by Taleb that may be of interest, ‘

The Black Swan: The Impact of the Highly Improbable‘ and ‘Antifragile: How to Live in a World We Don’t Understand‘.Tamino, I know it is a never-ending battle but I would be interested if you or any of the commenters have a perspective on the recent paper “Polynomial cointegration tests of anthropogenic impact on global warming” in which some economists attempt to show no correlation between CO2 and the climate record. It was recently given big play at WUWT.

Apologies if this has already been discussed elsewhere.

I think it was on Eli’s Rabbett Run.

“ClimEconomists”

— by Horatio Algeranon

ClimEconomists

Like ducks at fairs

Shoot one down

And one reappears.

Tamino has already commented on the core technique, in:

http://tamino.wordpress.com/2010/03/11/not-a-random-walk/

http://tamino.wordpress.com/2010/03/16/still-not/

Take-homes?

* Tests

(including the Augmented Dickey-Fuller, ADF, test, which can detect nonstationary series with added _linear_ trends)inappropriately applied to climate data(over periods with distinctly _non-linear_ forcings), resulting in false claims of non-stationary, ie random, climate behavior. If you look at the statistics of temperatures without incorporating any knowledge of changing forcings, it should be no surprise that temperatures appear to wander around. Also a failure to apply their tests to any linear subsets of climate data, which clearly show that temperatures are trend-stationary WRT forcings, or(as Tamino pointed out)to apply tests such as the covariant-ADF that can deal with underlying non-linear trends.* Followed by applying tests for

randombehavior todeterministicobservations of forcing histories, a rather appalling epistemological error. Forcings aren’t random.* Leading to such amazing conclusions as a decreasing CO2 forcing over time regardless of concentration, a negative (!!!) forcing from increasing methane, and other physical impossibilities – all due to inappropriately applying economic tests for random series, when regression of the original temperature and forcing data is the proper technique for trend-stationary series.

In short, nonsense. And a rather clear indication that the authors of that paper just didn’t understand the proper application of the tests they used.

James Annan discussed it here:

http://julesandjames.blogspot.se/2012/11/polynomial-cointegration-tests-of.html

and Tamino here:

http://tamino.wordpress.com/2010/03/11/not-a-random-walk/

Honestly, I don’t think we can expect Mr. Watts to understand statistics… I just wish he could learn about _anomalies_ and their implications. Then he could stop accusing NCDC of incompetence and fraud every time he discovers some weird discrepancy that is actually just due to different baselines and/or changing stations over time…

Why polynomial? Because if you do it linearly it’s obvious crap? Just speculatin’

Apparently it was ‘prebutted':

http://julesandjames.blogspot.com/2012/11/polynomial-cointegration-tests-of.html?showComment=1353776076482#c2950444256701813082

Also here:

http://ourchangingclimate.wordpress.com/2010/03/01/global-average-temperature-increase-giss-hadcru-and-ncdc-compared/

Time for the next round of septic nonsense. Donna Laframboise, secret agent.

[

Response:Donna Laframboise, the delinquent infant who mistook herself for knowing anything about climate.]Why are climate denialists so frakking stupid?

It is hard to know where to start…

Right. WWF ‘expert reviewers’–bad. Monckton-wannabe ‘expert reviewers’–good (or at least ignored.)

Thanks for the extensive responses.

Put the Sebago Lake stuff in it. It’s some of your best analysis.

This sounds great. I’m studying undergraduate psychology and psychophysiology; there is a lot of stats, no calculus so far. I’ll be sure to grab a copy!

Tamino, this graph of yours pops up where I post, and I noticed the trend (1975 – 2000: 0.159C/decade) is different to your choice in the ‘You Bet’ post (1975 – 2007: 0.181C/decade).

Would you consider doing an update on You Bet at this time, using the trend you worked with then? It’s not yet 2015, but a reminder of what you think it would take to nullify the warming trend since 1975 would be useful for a few discussions I’ve got going, and a couple of skeptics have referenced your old posts. This would also make the information more readily accessible for punters who don’t have the SkS page of your old posts at their fingertips.

Happy New Year,

barry.