Transitioning away from Matlab

In my last article, I talked about how I got a job in industry. I had been programming Python on and off for 10 years when I got my first non-academic job. Having proficiency in a language that people use outside of academia will improve your chances of getting a job in industry. Learning a new language will also grow your abilities as a programmer and will unlock new projects and analyses you might have otherwise been afraid to tackle.

What if you’re in a lab that’s been using Matlab for years and you haven’t had a chance to learn Python? Here’s my guide to transition off of Matlab to Python. There are a few specific links to neuroscience at the end of the article, but it should be useful to anybody approaching Python from a Matlab background.

This is part 1 of an ongoing series on learning Python. Part 2 – picking Python environments, packages and IDEs is here. Part 3 – translating old code with test-driven development is here.

Learn Python the hard way

Python is one of the most popular languages out there – more popular than Matlab by a factor of at least 10 – and it’s the language the vast majority people transitioning from Matlab will choose. There’s two basic strategies you can apply here:

  • Transfer learning (aka the easy way). You have things in Matlab you use every day – array operations, plotting, signal processing, etc. You learn to do the same thing in Python, perhaps with the aid of a cheatsheet. A lot of the Python data science API (numpy, scipy and matplotlib in particular) are very similar to Matlab. They were originally built to replicate functionality in Matlab. With this route, you’ll get moderately productive very fast.
  • Starting from scratch (aka the hard way). Learn the syntax, then do something you couldn’t do in Matlab, for example:
    • Learn algorithms and data structures
    • Make a GUI application (e.g. in PyQT)
    • Make a game
    • Make a dynamic website

My slightly controversial opinion is you’re better off starting from scratch. If you use the transfer learning method you will be less productive than you currently are for a long time. You will feel annoyed at your own incompetence ("why am I doing this to myself? I could do the same thing twice as fast!"). Furthermore, you will have a tendency of doing things the Matlab way (using matrices for everything, index chaining, avoiding for loops) that are error-prone and you shouldn’t be doing in a general purpose programming language.

By starting from scratch you will learn new things you couldn’t do before at all – you will feel like you have new powers. You can write a GUI! You can make dynamic visualizations! You can make games! You can do deep learning! Your day-to-day productivity won’t suffer because you’ll be learning new skills instead of relearning old ones poorly. You’ll end up being more productive in Python than you ever were in Matlab. You’ll write idiomatic code.

How long will it take to learn Python?

500 hours. Ok, that number is made up, but it’s probably not far from the truth. You can learn the syntax in a weekend. You can make your first significant project in a few days. However, proficiency comes with time. Don’t wait until 3 weeks before a technical interview to start learning. I’ve seen candidates do this, it’s not good. You’re better off asking for an interview in Matlab if you don’t have hundreds of hours of Python under your belt. Start today, keep at it every day (1-2 hours a day will suffice) and it will pay off.

Why is Python so popular anyway?

Yoshua Bengio. Again, I’m halfway joking, but one of the biggest reasons people have moved away from Matlab was the adoption of Python for deep learning. It started with Theano, which came out of Bengio’s lab. It built on numpy, scipy, sklearn and jupyter, all of which predated the rise of deep learning. Then came Tensorflow. People at Google were already using Python; Guido van Rossum, creator of Python, was famously employed by Google at one point. Google needed a high-level language to iterate quickly on models; many of the people involved in Theano were involved in Tensorflow. Python made sense.

It could have landed another way. We might all be using Lua instead if Yann had prevailed over Yoshua! But combine industry pressure, lots of money, the education sector needing a good first language, and open source, and Python is a runaway success now. And it might change in the future. Maybe we’ll all be using Swift in the future. Or Rust. Or Julia (my personal favorite!). For now, Python is the language to learn.

I don’t even hate Matlab

I’ve used Matlab extensively. I’ve been deep enough to create my own mex files and using Java. I’ve created GUIs in GUIDE. I have created pretty big codebases and classes. I even coded up my own neural net framework which I never published.

I’m not going to bash Matlab here. You can write good code in Matlab – and many people that have been using Matlab for years end up writing disciplined code. However, I have seen a lot of Matlab code of a certain kind – using matrices for everything (even though Matlab has dataframes and hashmaps!), avoiding for loops (even though it has a great JIT!), using a giant set of globals in GUIDE (there’s a way of writing good GUIDE apps, I’m sure!). It will take a little bit of work to unlearn these old habits if you have them.

A curriculum

If you have never seen or touched Python, you can start by learning the syntax and the basic data structures (tuples, dicts, and lists) through online resources.

Some sample websites are:

Most of these websites will have some sort of live evaluation directly on the website. Pretty soon you’ll need a local install of Python. I recommend installing the Anaconda distribution of Python 3. This includes the conda environment manager, which will allow you to maintain different sets of packages.

Picking a first project

At this point, you could consider using Python for a project (no data science stuff yet). The first app I made was a GUI to upload files to a website. A few years later I made an app in PyQT to annotate physiological recordings with notes. You could make a website. Lots of different projects but basically make sure you cover the basics, meaning:

  • functions – these will trip you up. Python has pass-by-reference semantics for object types. In Matlab, a function cannot modify its arguments (unless it’s a reference object, but these are pretty uncommon). But in Python you can mess with your arguments:
def my_append_fun(a)
    a.append('b')

c = ['a']
my_append_fun(c)
print(c)
# Prints ['a', 'b']

It’s not good practice to do this, but you might do it accidentally and you will be very confused.

  • modules – in Matlab, one file = one function (unless it’s an internal function private to that file). In Python you can have multiple functions and classes inside a file. Each file defines a module that can be imported. Then there are other people’s modules! pip! Understanding how and when a certain module or function is accessible is one of the subtle things about Python.
  • tuples, dicts and lists. A simple write-only type for fixed-length groups; a super-powerful hashmap type; and a variable-length vector. It will take you a while to understand the trade-offs for each of these types. Your assumptions will be wrong! You might think appending to a list in a for loop is super slow because appending to a vector is super slow in Matlab. Wrong! It’s actually super fast.
  • classes. Classes are behaviour + data. Maybe you’ve been doing OOP in Matlab. If not, it’s time to pick it up!
  • strings and file IO. Format strings, with statements, StringIO, regex – all workaday things in a general purpose programming language that you might never have touched in Matlab.

Data structures and algorithms

Consider learning about data structures and algorithms. Many professional programmers are self-taught and never really learn about these fundamentals. They learn intuitively what is slow and fast, and can code many non-trivial algorithms. They might even use complexity analysis (O notation).

Once you learn data structures, your world will open up. This is especially true for people with a Matlab background because the language tends to force you to use the same structure over and over again (the matrix). You might not know what to do with tuples, dicts, lists and objects. You need some solid foundations to transition out of the weird programming model Matlab imposes.

Take the Algorithms and data structures classes on Coursera by Tim Roughgarden. These are the same classes you’d get in CS at Stanford, and they are very good. Really tough. You will feel like your mind is melting – in a good way.

You might try a daily challenge, like hackerrank. There’s Project Euler if you’re more math-inclined.

The data science ecosystem

It’s finally time learn the data science pipeline. Because you will have learned basic Python well, and the tools are very similar to Matlab, your transition will be very smooth. Here’s one tutorial that guides you through the Python data science ecosystem. This means getting familiarized with:

  • matplotlib for plotting
  • numpy for matrices
  • scipy for signal processing
  • pandas for dataframes
  • sklearn for machine learning
  • jupyter for dynamic notebooks

The hardest package to learn for many people coming from a Matlab background is pandas. Why would anyone want to use pandas? Can’t I just use a matrix?

How many times has your PhD advisor told you to label your axes in your plots? A thousand times? Labeling axes is important to understand what the data means. When you index into an unlabelled matrix, say df(:, 7), you’re giving yourself the possibility of forgetting what the data means. Wouldn’t it be better to use df.reaction_time_ms? Yes!

You might even have learned how to do reductions, querying and aggregations on raw matrix. This code might give you the mean reaction time for participant 10:

mean(df(df(:, 1)==10, 7))

That’s bad! What if you add a column in your CSV? Then column 7 becomes column 8, your stats are wrong and you’ll lose months tracking down the issue. This code is super error-prone and makes kittens cry. I’m not saying it’s impossible to do this the right way in Matlab – I’m just saying that’s often how people do it. Compare the pandas way:

df.query('participant_id == 10').reaction_time_ms.mean()

Take the time to read the pandas tutorial in full. You might want to pick up a book as well after a little while, for instance Python for Data Science.

At this point, you’ll need to practice your new skills. Join a Kaggle competition, or follow a data science MOOC.

Specialized tools for neuroscience

At this point you might be productive enough to use Python on a daily basis. Congratulations! It’s a good time to learn about Python neuroscience tools:

  • nilearn for machine learning with MRI
  • brian for spiking neural net simulations
  • pytorch or tensorflow for fitting ANNs
  • deeplabcut to track animals without markers
  • Psychopy for visual stimulus presentation
  • neo for managing electrophysiology data in Python
  • MNE for EEG analysis
  • PyMC3 for Bayesian inference

Know of another indispensable tool? Write it in the comments.

Transition existing pipelines to Matlab

It will be hard to transition pipelines developed over years wholesale out of Matlab and into Python. You’ll want to freeze data obtained after running a pipeline into a file format that you can transport from Matlab to Python. Matlab’s .mat file format is readable in Python. Since v7.3 .mat files are in the hdf5 file format, for which Python has excellent support.

In the future, you might find that you need Matlab for very little, and only one or two recalcitrant pipelines will need to be maintained. You could wrap these pipelines in a docker image, so they keep working for years to come, despite changes in OS and Matlab versions.

Becoming proficient

You might start developing significant codebases. Will you be putting everything in Dropbox? No, you’ll be using git! You’ll need to know the command line. Follow the software carpentry class and learn about the Unix terminal and source control.

To really step up your game, contribute to an open-source project. If you’re unsure where to start, join a Brainhack event – people will ask for volunteers for their projects and they’ll guide you through the process of making a submission. Friendships will be created! Collaborations hatched! Bagels will be had!

Perhaps you’ll feel the need for speed at this point. Tear through billion row datasets with Spark or dask. JIT your for loops or compile them down to C with Cython, numba or jax. Monte Carlo simulations with a thousand parallel chains? Not a problem!

Make it social

People don’t realize how social programming is. As a software engineer at Google, I:

  • Learned about the core tooling in seminars with other engineers
  • Had people review my code
  • Reviewed other people’s code
  • Pair programmed
  • Shared analyses with other people who critiqued the analysis (both the content and the method)
  • Went to retreats to learn new tools
  • Organized a reading group

I learned more from these experiences than from much of the book reading and solitary programming I’ve done. You don’t have to make your learning journey an isolating one. When you do Kaggle, join a team. Go to local meetups – in Montreal for instance, there’s Les Pitonneux, who meet up almost daily. Many groups focus on supporting underrepresented groups in computer science, for example PyLadies.

There are hackerspaces where you might find people in the same situation. You can go to tutorial sessions at conferences. Make friends and create a social support system to help you on your journey.

Conclusion

I’ve had to relearn programming many times over the years. Toolbook, QBasic, VB, Delphi, Actionscript, Perl, PHP – I’ve written significant chunks of code in each of these languages. They’re all either dead or on their way out. Having seen the trajectory of these languages, it feels to me like Matlab is on its way out as well. This doesn’t mean it won’t be used at all – after all, PHP is 10 times less popular today than at its peak, but it still runs Facebook! But it does mean that:

  • There will be more people who know Matlab than people who are hiring people who know Matlab. It won’t do you any favors on your CV.
  • Professionals software engineers have already switched away from Matlab. You will have a lot of trouble hiring professional programmers to create Matlab-based infrastructure in your future lab if you choose to stay in academia.
  • The Mathworks is a single-source vendor, and their software is closed-source. What happens if they run out of business?

It’s time to move away from Matlab. Take the first step today!

This was part 1 of an ongoing series on learning Python. Part 2 – picking Python environments, packages and IDEs is here. Part 3 – translating old code with test-driven development is here.

2 comments

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s