Calling R from Matlab – flat file communication

In the last post I showed how to fit a Bayesian hierarchical model in R to estimate experimental parameters. Our main data analysis pipeline uses Matlab, and we’d like to integrate these two environments for the purpose of analysis.

The example involved JAGS, which does have a direct Matlab interface in addition to the more common R one, but more generally you might need to use some unique package  for an analysis that only runs in R. So how can you pipe your data from Matlab to R and back?

Flat file communication

There are a variety of unofficial interfaces between Matlab and R – you can call R from Matlab on Windows via a COM extension, or you can spawn Matlab engines and manipulate .mat files in R – but the very simplest communication involves writing your data in flat files, and loading them on the other end.

The csvwrite and csvread functions in Matlab, and the corresponding read.csv and write.csv functions in R are really all you need. Be aware, however, that Matlab’s built-in CSV reading function doesn’t support headers or string variables. You can pick up this more fancy version from the Matlab File Exchange that does.

Matlab side

Matlab allows you to run terminal commands using the ! operator, or the OS-specific dos and unix commands. Thus, to read some data, run an R script, then read back the data written by R is as simple as:

csvwrite('in.csv',[originaldat]);
!R CMD BATCH /home/username/Documents/script.r
dat = csvread('out.csv',1,1);

Matlab will block until R returns, so you don’t need to worry about asynchronous callbacks.

R side

On the R side, you need a batch script that reads the data, transforms it, and then writes it back. For our previous example, this would look like:

rawdat <- read.csv('in.csv',header=FALSE)

#Extract the data we need
ms <- rawdat[,1]
stds <- rawdat[,3]

#Run the MCMC model
library("rjags")
library("coda")
jagresult <- jags.model('slices.model',data=list('ms'=ms,'stds'=stds,'N'=length(ms)),n.adapt = 10000)
thesamps <- coda.samples(jagresult,c('mu','tau','realmean'),n.iter=100000,thin=10,progress.bar="gui")

#Capture and dump to out file
data <- data.frame(summary(thesamps))

write.csv(data,'out.csv')

And that’s pretty much all there is to it. You can use a similar process to call Python from Matlab. See also this post about reading pickle files in Matlab.

Calling R from Matlab – flat file communication

Bayesian mixed effects model to estimate experimental parameters

I ran into a tricky problem while analyzing a series of experiments we recently performed with Scanbox. We’re trying to estimate the angle of the objective with respect to the cortical surface.

objective-0012

The way we approach this is by scanning at a given depth, going down a little bit, scanning again, and so on. By measuring the displacement of the mean images from slice to slice we can get an estimate of the angle of the objective with respect to the cortical surface.

By estimating the per-frame displacement in a stack of mean calcium images, we can estimate the angle of the objective with respect to the cortical surface.
By estimating the per-frame displacement in a stack of mean calcium images, we can estimate the angle of the objective with respect to the cortical surface. Here. the displacement is towards the bottom left, which means the xz angle is negative.

It’s important to get a good estimate of the depth of a recorded ROI to understand how computations are elaborated by different layers of neurons. The depth varies within an imaging field as a function of the angle of the objective with respect to the cortical surface, so we need a good estimate of that angle.

We estimated the xz angle in some 9 different experiments by measuring the displacement of the imaging field in consecutive fields. Here they are with error bars:

It’s pretty clear here that the data cluster at around -15 degrees, but it’s noisy. Some of the variation is due to legitimate differences between samples – we aim to perform similar surgeries and injections every time, but it’s biology, so there’s always going to be some variation – and some of it is due to measurement noise.

The very noisy observations are due to the sample moving, i.e. during grooming. It’s easy enough to say “well you should have gotten the data right the first time”, and of course that’s what we always aim for, but we’re not going to scrap days worth of data because we got unlucky for a few minutes. Rather, to get an optimal per-experiment estimate of the angle, we’d like to shrink the noisier observations towards the weighted population mean.

This is a perfect fit for a Bayesian hierarchical mixed effects model. There’s a fixed effect (the mean angle across experiments), and there’s a random effect (the per-experiment angle):

\theta_i \sim Normal(\theta,\sigma^2) \newline  y_i \sim Normal(\theta_i,\sigma_i^2)

We’d like to compute some weighted mean of these two effects to come up with an optimal estimate of the angle per experiment, i.e. get estimates of \theta_i.

This maps exactly onto the per-school mean SAT example from Gelman et al., chapter 5, a two-level hierarchical model for Normal-distributed means. I had written up about getting this model going in R about 4 years ago. I fired up the model in RStudio – the standard IDE for R these days, rather than RKward – and it works as is. Here’s what the shrunken estimates look like:

experiment number vs $\text{estimated angle (}\circ \text{)}$

Much better. The final per-experiment estimates vary between -17 and -12 degrees, which is pretty reasonable considering the biology. With this, we have a principled estimate of the objective angle that optimally pools mean experiment and per-experiment effects.

Bayesian mixed effects model to estimate experimental parameters

Scientific Python in the browser: ipython notebook

By far the most popular post on this blog is a review of several Python integrated development environments (IDEs) geared toward science. Coming from a Matlab background, it’s natural to search for something Matlab-like to replace it – an IDE with integrated editor, code execution, plotting, benchmarking, file management, etc.

An increasingly attractive alternative is the IPython Notebook. The ipython Notebook interface, which runs in the browser, allows one to write and run interactive notebooks which combine code, documentation – including Markdown and LaTeX equations – and interaction seemlessly. It’s not unlike Mathematica, Maple, or RMarkdown.

You can try out an interactive ipython notebook session on nature.com.

I covered ipython notebook a couple of years ago, back when it was a relatively new tool, but it’s become a lot more powerful as it has matured and its ecosystem has grown. It has become an excellent tool for running and documenting exploratory analyses, while the rest of the Python ecosystem – IDEs, IPython console, debugging tools, etc. – can be leveraged for batch processing and standardized analyses.

Here I highlight some of the more advanced features of ipython notebook with particular focus on recently added features.

Reporting

Ipython notebook particularly shines for creating narrative reports – a form of literate programming which is an excellent workflow for data analysis.

A narrative report mixes code, plots, and a text narrative that highlight results, non-results, thoughts and concerns. A first draft of a narrative report might sound like stream-of-consciousness beat poetry meets data analysisA bit of editing tightens the narrative and serves to aggregate and summarize one’s thoughts – map-reduce for the brain. The report can then be used for self-archival and sharing insights with other team members. It can also be used to support open science.

Matlab offers this possibility with cell-mode publishing, but the ipython notebook is leaps and bounds above Matlab’s report generation. The notebook interface is particularly well-adapted for narrative reports, as it transparently mixes code, plots, printouts, text, Markdown, and LaTeX.

An IPython notebook – which includes code, text, and the results of computations – is essentially a JSON-formatted file with an .ipynb extension. An .ipynb file can be shared, stored, viewed, and converted in a number of ways:

interactivity

A recent addition to the notebook interface is interactivity with widgets. At its most basic, one can link a slider widget with a callback to a function to create an interactive plot, e.g.:

%matplotlib inline

import pylab as plt
import numpy as np
from IPython.html.widgets import interact
import math

def plot_sine(period=10):
x = np.linspace(0,10,100)
plt.plot(x,np.sin(x*2*math.pi/period))

interact(plot_sine,period=(2,20,.5))

inline_interact

 

Other Javascript-based widgets are available, including buttons, checkboxes, dropdowns, and various containers. The documentation is a bit anemic at this point, the widgets can be buggy, and interactive plots don’t work in the notebook viewer, but hopefully these are growing pains that will be fixed in new versions. Nevertheless, it’s usable – I recently used it to interactively tag data series as same/different with interactive buttons – and it has a lot of room to grow.

Multi-language support

IPython and the notebook interface have proven so popular that a recent effort has been made towards supporting multiple languages, both at the command line and in the notebook interface. The Jupyter project, which, as I understand it, is a fork/superset/new version of IPython, supports interactive notebooks for Python, R, Julia, and many others.

Hopefully this will help bridge the gap between these languages. As much as I love Python, R has a much larger number of statistical models in it. Julia, on the other hand, offers the compiled-like speed that’s necessary for some types of numeric problem that can’t be vectorized easily.

coLaboratory allows multiple people to collaborate on Jupyter notebooks . The notebooks are hosted on Google Drive – the execution is handled either by a Jupyter kernel running in Chrome or a kernel on the host computer. It’s not as streamlined as it could be right now, but it has a ton of potential, as it does not require the user to have ipython or jupyter installed on their computer – only the chrome extension is required.

Closing thoughts

There are other features of the ipython notebook that I haven’t covered in detail here. For instance, the notebook interface can be used to manage local or remote kernels for parallel computing.

The notebook interface has grown a lot in the last 2 years, and it’s quite useful for day-to-day work. Jupyter, when it matures, will be bring seamless support for multiple languages. Furthermore, ipython, matplotlib, and others will benefit greatly from the exponential growth in data science, which means that they’re now backed by corporate sponsors that mean that contributors can be devoted to these projects full time.

Maybe not today, maybe not tomorrow, we can finally leave Matlab, and say once and for all: I will never code another GUI in GUIDE again!

More resources:

Scientific Python in the browser: ipython notebook