I got another couple of subscriptions to the Climate Data Service (see the end of the post if you want to sign up), and one of them included an interesting question:
I’d also like to ask a question. I’m thinking of making a career change to data science, but am not really sure where to start. I currently work as an analyst for the Department of Defense, so I’m interested in the national security aspects of climate change as well as the human and economic changes coming. (my background is in physics so I’m not new to quantitative analysis)
How did you get your start and what would you recommend as a good path to get there? Thanks!
I got my start doing data analysis (suprise surprise) in astronomy. It took me some time to acquire some of the needed skills, because when I was a math major the only math class I wasn’t really interested in was: statistics! Of course, back in those days we weren’t blessed with a powerful computer in every laptop, so statistics, and data analysis, weren’t the same as they are today.
When I see job ads for data scientists, they’re usually looking for people who don’t necessarily have stats expertise, but do have some knowledge of data analysis and a background that’s definitely quantitative — those with training in math, computer science, physical science. I have a feeling there’s a lot of “on-the-job learning” going on, due to the demand in this field.
So: what skills would you want to develop? I’m hardly the greatest expert here, but I’ll give you some opinions.
First, let me tell you the definition of statistics: A mathematical science concerning the collection, analysis, interpretation, and presentation of data. So, statistics itself is the (mathematical) science of data. That doesn’t mean it’s focused on what career people do as “data science.” But it does tell you that one of the key aspects is: statistics. Most who train in math and/or science neglect this discipline (lots of programs don’t even reqire you to study it at all), so get a solid foundation in basic stats. Your employer won’t care about it that much, and it won’t be “most of the job” by any stretch, but you’ll be glad you did.
Another thing to learn is model-building, especially predictive models. A lot of this is classification by machine learning methods. That involves certain types of mathematial models, which you should become familiar with. You don’t have to learn all methods — they’re inventing new ones all the time any way — but there are some you should be familiar with. These include least-squares regression (which you probably already know) and logistic regression (good for binary classification problems), as well as newer methods, including “tree” methods. Those include classification trees/regression trees, and a variant (which is en vogue right now because it’s so good) called “random forests.” You could add support vector machines to the list, and some make good use of neural nets (although I dislike their “black-box” nature, but that’s just me). Anyway, learn a variety of types of models and how they work.
Many jobs, especially in finance/marketing, talk about “big data.” Basically, it’s when there’s so much data you need more than one computer (distributed processing) to handle the workload. You’ll have a leg up if you know some of the tools for that: Hadoop and MapReduce.
Last, but not least (perhaps most), work with data. I’m a data junkie — I can’t help myself. Because of that, I knew about model-building and statistics and stuff before I ever heard of “data science” And, much knowledge and wisdom comes from experience. A lot of kids straight out of college have trained specifically to be data scientists, but they make “rookie mistakes” because there are so many pitfalls they’ve never seen before. And, they’re usually well-trained in model-building and programming, but weak in statistics. For top positions, I’ll take a seasoned pro.
These days they seem to be offering considerable on-line training courses through EdX. Look into that, “data science” is one of their more popular offerings.
If you want to learn statistics, consider taking the course I’m going to offer soon (I hope). It’s too often neglected, and doesn’t seem to be “sexy” enough to be as regularly offered in online courses. Also, I think I teach it better than they do.
Finally: for those interested in the Climate Data Service, the price will go up soon so you might want to subscribe now. To do so, step 1: donate $25 at Peaseblossom’s Closet, step 2: post a comment here (which I will not make public) so I know who you are and where to email it.
This blog is made possible by readers like you; join others by donating at Peaseblossom’s Closet.