In General top companies increasingly expect data scientists to have advanced degrees (M.S. or Ph.D.) in a quantitative field. It wasn’t always so, Many data gurus – election forecaster Nate Silver, Moneyball’s Paul DePodesta and Cloudera’s Jeff Hammerbacher among them – only have bachelor’s degrees.
But a one- or two-year M.S. with some solid work experience is quickly becoming the norm. It doesn’t have to be in data science, could be econometrics, physics, statistics, computer science, applied mathematics and engineering. Skills required can be broadly split into two categories:
- Hard (technical)
- Human ( interpersonal)
- Maths – After getting the basic like calculus and linear algebra dig into matrix computation, diffusion geometry, and similar topics in applied mathematics.
- Statistics – Michael Sanders explains: “Understanding correlation, multivariate regression and all aspects of massaging data together to look at it from different angles for use in predictive and prescriptive modeling is the backbone knowledge that’s really step one of revealing intelligence.”
Which means interviewers are going to be looking for core competencies in statistical tools such as: R, SciPy
- Programming Languages – Python, C/C++/Cuda, Java
- Relational Databases – NewSQL – “highly-scalable, horizontally-distributed systems” like Cloudera Impala, Clustrix, VoltDB, etc.
- Distributed Computing Systems – Get used to the Apache product family. Learn to speak the language of caching, sharding and scalability.
- Data Mining – Supervised algorithms (Naïve Bayes, Decision Tree, Neural Network, etc.) and non-supervised (Association Rules, Clustering, etc.).
- Data Modeling – Sanders says: “Knowing the difference between a fact table that is put together well and one that is faulty with semi-structured unconstrained keys makes all the difference in how easily you can trust and massage the data you’re trying to capture.”
- Predictive Modeling – Harris classifies predictive modeling as one of four core competencies (along with SQL, statistics and programming).
- Machine Learning – Andrew Ng’s free Machine Learning course on Coursera has produced distinguished alumni, like Kaggle winner Xavier Conort.
- Visualization – D3.js, Google Visualization API, Tableau
- Domain Expertise – Perhaps the first commandment of data science is “Know Thy Data.”
- Creativity and Curiosity – Netflix asks each of its data science applicants to come up with a framework that solves a problem.
- Storytelling – Once you’ve struck oil in that data, you’ll be responsible for selling your discovery to everyone in the organization.
- Project Management – In a commercial environment, your job is to produce actionable, profitable insights from data.
- Ethics – As a data scientist, you are guaranteed to be on the front line of future ethics debates:
- The Elevator Speech – Harris explains what Chris Pouliot, Director of Algorithms and Analytics at Netflix, is really looking for in candidates:
“An advanced degree in a quantitative field; hands-on experience hacking data (ideally using Hive, Pig, SQL or Python); good exploratory analysis skills; the ability to work with engineering teams; and the ability to generate and create algorithms and models rather than relying on out-of-the-box ones.”