Last Monday was my first post on Must Love Data Mondays. In that post, I talked about the reasons I wanted to write a weekly post about Data & Code. I also wanted to learn more about the data science field, review resources, projects, and job prospects. I feel like the field of data science allows me to work in those two interests while also combining a third interest of mine which is to articulate the meaning of the information.
Last week, I decided to do a lot of research about data science. I read over 20 articles on the topic and I feel like I have some basic plan for someone interested in data science. This Quora article is really helpful and a lot of my information comes from its links. This lead me to ask the question, “What are the prerequisites you need to complete to start studying data science?”
- A strong math background.
- Know at least one programming language.
“Let none ignorant of geometry enter here.”
I’ve been listening to, The Story of Philosophy by Will Durant, and I came across the above quote from Plato. A rule for those interested in entering the field of data science could read this way, “Let none ignorant of calculus enter here.”
I don’t think you need to be a PhD in Statistics or Physics, though it wouldn’t hurt, but you should have a solid foundation in Multivariable Calculus and Linear Algebra. MIT Open Courseware offers both Multivariable Calculus and Linear Algebra courses which includes the syllabus, required textbook, readings, assignments, exams, and lecture videos. On Coursera, Brown University offers a Linear Algebra course, Coding the Matrix: Linear Algebra through Computer Science Applications, which began on 2/2/15. I’ve been using Khan Academy to brush up on my Calculus before jumping into a full blow course on MIT OCW or Coursera.
The second prerequisite is programming. If you don’t have any programming experience then I think this link offers a list of great sources. Having the ability to program is a must. I’m not saying you need to be a world-class software engineer but you need to be able to read and understand code to figure out ways to use it to assist with your data analysis.
I’ve had a lot of experience with a number of different languages but I’ve had to jump around from language to language due to different jobs or education demands. It’s made me feel like I never really grasped one language. That’s why, earlier this year I decided to concentrate on learning one modern language. It was down to either Ruby or Python, since I knew a Ruby developer, I decided to choose Ruby. It’s been going well and I have no complaints with Ruby. I think it’s a great language but after completing last week’s research it’s become apparent that Python is the language for data science. That’s why I’ve decided to switch from learning Ruby to Python.
There are a number of other technologies and languages that are also a necessity but none of that matters if you don’t understand basic programming. Plus, Python is a good first language to learn. I listed a link above for a lot of resources on learning to programming. I like and use Learn Code The Hard Way. I was using their, Learn Ruby The Hard Way (LRTHW), but I switched to their most popular book, Learn Python The Hard Way (LPTHW).
These two prerequisites will be common themes for the next few months of Must Love Data Mondays. As I work my way through the process of refreshing my skills needed for more serious study. I’ll use this weekly blog post to write about my experiences, what I’m learning, data related news, and articles I’m reading.
All while my wife thinks I’m weird for practicing calculus problems on our iPad.