machine learning,

Don’t Forget Your Python in the Data Science Jungle

Charles Charles Follow May 15, 2017 · 3 mins read
Don’t Forget Your Python in the Data Science Jungle
Share this

Okay, so Python was named after Monty Python’s Flying Circus and not the reptile but it seemed like a good title so I went with it. Nonetheless, Python is becoming increasingly important in Data Science and Machine Learning.

For those not familiar with the Monty Python’s Flying Circus here’s a quick recap of the top 10.

As a software engineer, I’ve worked with a few scripting languages over the years like bash, batch, Ruby, JavaScript and even VBScript. I’ve also worked with a handful of compiled statically typed languages like C/C++, C#, Java, and VB.NET but I’ve never dived that deep into Python. That was until I started getting into machine learning and data science.

In all honesty, I tried to avoid Python for the last few years because I didn’t see the need or value in learning yet another scripting language but that all changed when I noticed the overwhelming support and incredibly active development community that backed Python. That combined with the astounding success of libraries and frameworks like Anaconda and scikit-learn made it an easy decision.

Truly, my hat’s off to the community. I don’t think I’ve even experienced such support for both open and closed source projects not to mention support for non-Python centric applications through the use of bindings. The closest I’ve came was with Ruby, but for the most part, it started and ended with Ruby-centric projects.

I highly recommend taking a look at the Python courses on DataCamp.com and MachineLearningMastery.com. They are both great ways to get started with a focus on just the tools and skills needed to achieve results.

You’ll also undoubtedly come across lessons in the R programming language as well. If you’re new to both languages, start with Python and then dig into R once you’re comfortable with Python.

Here are a few Python references I’ve found helpful along the way:

  • Python.org — The mothership. You’ll save countless hours and headaches by taking an hour to read the documentation.
  • Python Tutorial — Informal, getting started tutorial for Python 3.x
  • Programiz Python Tutorials — Great beginner tutorial that covers most aspects of the language itself.
  • Python 2.x or 3.x for Machine Learning: Quora Discussion — I say 3.x or better yet, don’t get locked in to a single local version. Here’s a RealPython.com Tutorial on Virtual Environments which lets you run multiple independent versions of Python locally. If you’re familiar with Ruby think of it as Python’s approach to rvm or rbenv.
  • SQL Alchemy is a Python SQL toolkit and Object Relational Mapper (ORM).
  • Anaconda Cloud is where data scientists can collaborate and share their ideas and work publicly. They also have an installer and package manager that makes installing data science packages seamless. It’s my goto for local development.
  • scikit-learn is “Machine Learning in Python” which has NumPy, SciPy, and matplotlib packages built-in.

As always, if you have questions please leave them in the comments or reach out over Twitter.

Join Newsletter
Get the latest updates without the SPAM!
Charles
Written by Charles Follow
Hi, I am Charles, welcome to my blog!