For the past few weeks, I’ve been researching about Data Science and the best ways to learn about Data Science. Last week, I found a lot of great sources on education and I stumbled on the above visualization of a “Map of Data Science Skills to Learn” on the Udacity Blog, Climb Higher, titled, “How Do I Start Learning Data Analysis?” I think the above visualization of Data Science is a good outline of all the different types of knowledge you need to acquire over time. It’s intimidating but no one becomes a data scientist overnight. The prerequisites alone can take over a year to acquire. I think of this learning path like a marathon. Nobody runs a sub four-hour marathon without any training. Data Science is a marathon not a race.
This picture can be overwhelming but you need it down into actionable goals. Currently, I’m working on the Fundamentals, Statistics, and Programming sections. Here are my two prerequisites:
- Programming (Python & R)
I haven’t had a course in Calculus or Statistics in more than eight years. I’ve done a lot of quantitative work over those years but I feel like I’m probably need a refresher on both subjects. For Calculus, I’ve been using Khan Academy Mastery quizzes and watching the associated videos for the topics I can remember. The iPad app is impressive it allows you to work through Calculus problems with nothing but the iPad, which includes a scratchpad and scientific calculator. Khan Academy offers lessons in higher level courses like Multivariable Calculus, Linear Algebra, and Differential Equations. I’m planning to complete my Calculus refresher by the end of April. Then I think I’ll use the Khan Academy videos as a supplement with a more organized course on MIT OpenCourseWare or Coursera, depending on availability.
I feel pretty good on basic statistics but I need to test my knowledge on the more complicated subject matter. Khan Academy has a Probability and Statistics section but I’ve only taken a few mastery quizes. Last Week, I started working on Udacity’s Statistics 101 course but I can’t watch the videos during my lunch break so I’m considering other options.
Python: I’m almost halfway in Zed Shaw’s “Learn Python The Hard Way” (LPTHW). The plan is to complete LPTHW by the end of March. Then I need to find a project to continue to work on my skills.
R: Start Code School’s “Try R” tutorial.
In summary, I’m trying to complete my code tutorials by the end of March, and the Calculus and Statistics refresher by the end of April. I’ve been looking at different online courses in Data Science and there are a lot of options with a wide price range. Obviously, getting a certificate from Standford is the best way to go but $19,000 is too expensive for me to justify. I’m learning toward trying a free MOOC like Coursera’s Data Science Specialization offered in partnership with John Hopkins University or Udacity’s Data Analyst Nanodegree courses.
Coursera is more affordable if I’m interested in receiving an actual certification. I’m not really interested in certificates or degrees. It also seems to be more statistic and R heavy, where Udacity seems to focus on Python and Software like Hadoop, MapReduce, and Apache Storm. I really want projects to complete so I can include them on my Github and website. This makes me lean towards Udacity, which seems to be more focused on industry experience. Though I think both are probably decent ways to at least familiarize your self with a lot of key topics in Data Science.
Will they make you expert data scientist?
Probably not, but I think they’re good free places to start.