Scientific Python in the browser: ipython notebook

By far the most popular post on this blog is a review of several Python integrated development environments (IDEs) geared toward science. Coming from a Matlab background, it’s natural to search for something Matlab-like to replace it – an IDE with integrated editor, code execution, plotting, benchmarking, file management, etc.

An increasingly attractive alternative is the IPython Notebook. The ipython Notebook interface, which runs in the browser, allows one to write and run interactive notebooks which combine code, documentation – including Markdown and LaTeX equations – and interaction seemlessly. It’s not unlike Mathematica, Maple, or RMarkdown.

You can try out an interactive ipython notebook session on nature.com.

I covered ipython notebook a couple of years ago, back when it was a relatively new tool, but it’s become a lot more powerful as it has matured and its ecosystem has grown. It has become an excellent tool for running and documenting exploratory analyses, while the rest of the Python ecosystem – IDEs, IPython console, debugging tools, etc. – can be leveraged for batch processing and standardized analyses.

Here I highlight some of the more advanced features of ipython notebook with particular focus on recently added features.

Reporting

Ipython notebook particularly shines for creating narrative reports – a form of literate programming which is an excellent workflow for data analysis.

A narrative report mixes code, plots, and a text narrative that highlight results, non-results, thoughts and concerns. A first draft of a narrative report might sound like stream-of-consciousness beat poetry meets data analysisA bit of editing tightens the narrative and serves to aggregate and summarize one’s thoughts – map-reduce for the brain. The report can then be used for self-archival and sharing insights with other team members. It can also be used to support open science.

Matlab offers this possibility with cell-mode publishing, but the ipython notebook is leaps and bounds above Matlab’s report generation. The notebook interface is particularly well-adapted for narrative reports, as it transparently mixes code, plots, printouts, text, Markdown, and LaTeX.

An IPython notebook – which includes code, text, and the results of computations – is essentially a JSON-formatted file with an .ipynb extension. An .ipynb file can be shared, stored, viewed, and converted in a number of ways:

interactivity

A recent addition to the notebook interface is interactivity with widgets. At its most basic, one can link a slider widget with a callback to a function to create an interactive plot, e.g.:

%matplotlib inline

import pylab as plt
import numpy as np
from IPython.html.widgets import interact
import math

def plot_sine(period=10):
x = np.linspace(0,10,100)
plt.plot(x,np.sin(x*2*math.pi/period))

interact(plot_sine,period=(2,20,.5))

inline_interact

 

Other Javascript-based widgets are available, including buttons, checkboxes, dropdowns, and various containers. The documentation is a bit anemic at this point, the widgets can be buggy, and interactive plots don’t work in the notebook viewer, but hopefully these are growing pains that will be fixed in new versions. Nevertheless, it’s usable – I recently used it to interactively tag data series as same/different with interactive buttons – and it has a lot of room to grow.

Multi-language support

IPython and the notebook interface have proven so popular that a recent effort has been made towards supporting multiple languages, both at the command line and in the notebook interface. The Jupyter project, which, as I understand it, is a fork/superset/new version of IPython, supports interactive notebooks for Python, R, Julia, and many others.

Hopefully this will help bridge the gap between these languages. As much as I love Python, R has a much larger number of statistical models in it. Julia, on the other hand, offers the compiled-like speed that’s necessary for some types of numeric problem that can’t be vectorized easily.

coLaboratory allows multiple people to collaborate on Jupyter notebooks . The notebooks are hosted on Google Drive – the execution is handled either by a Jupyter kernel running in Chrome or a kernel on the host computer. It’s not as streamlined as it could be right now, but it has a ton of potential, as it does not require the user to have ipython or jupyter installed on their computer – only the chrome extension is required.

Closing thoughts

There are other features of the ipython notebook that I haven’t covered in detail here. For instance, the notebook interface can be used to manage local or remote kernels for parallel computing.

The notebook interface has grown a lot in the last 2 years, and it’s quite useful for day-to-day work. Jupyter, when it matures, will be bring seamless support for multiple languages. Furthermore, ipython, matplotlib, and others will benefit greatly from the exponential growth in data science, which means that they’re now backed by corporate sponsors that mean that contributors can be devoted to these projects full time.

Maybe not today, maybe not tomorrow, we can finally leave Matlab, and say once and for all: I will never code another GUI in GUIDE again!

More resources:

Scientific Python in the browser: ipython notebook

Be like Mike: Michael Jordan’s reading list

Not this Michael Jordan, that Michael Jordan.

There’s a machine learning reading list by Michael Jordan that’s been floating around on Hacker News for a few years, and in a recent AMA he added a few more. Full list:

Lots of theoretical stuff, to which we might want to add the more applied classics, i.e. Bishop, Mackay, Murphy, and Tibshirani. How many can you check off?

 

Be like Mike: Michael Jordan’s reading list

Extract data directly from websites with import.io

I’ve discussed before how to extract data from static published graphs in Matlab. But what if you wanted to extract table data from a website that doesn’t have an API?

import.io does just this. You show a few examples of the data you want to extract from a site, and it guesses intelligently how to extract other data points from the same or similar pages. You can then access that data as JSON through a REST API, or just download a .csv file.

You could use it to extract data tables from Wikipedia, government agencies, commercial websites, etc. for legit scientific purposes. Better yet, you can extract top 1000 lists from, say, Les Inrocks, and get them into Spotify for some infinite lab playlists.

 

Extract data directly from websites with import.io