As a scientist, interacting with data allows you to gain new insight into the phenomena you’re studying. If you read the New York Times, the D3 docs or you browse distill, you’ll see impressive browser-based visualizations – interactive storytelling that not only accurately represent data but bring your attention to surprising aspects of it. Making good visualizations of your work can increase the accessibility of your research to a wide audience. My favorite example here (already several years old) is the semantic map of the brain from James Gao, Alex Huth, Jack Gallant and crew – at the time, visualizing brain map data in the browser was unheard of, and here was a visualization of how semantics map to different brain areas, in the browser, no downloads necessary.
A big wall you’ll run into is that modern web development is big. Getting to proficiency is hard: it’s very easy to get discouraged and fall off the wagon before you get to build something interesting. What I’ve assembled here is a kind of roadmap so that you can start building interesting dynamic visualizations with the lowest barrier to entry as possible, while building a self-reinforcing skillset.
|Technology||Application areas||Pros||Cons||What you learn|
|Python-based generators||Dashboards, lightweight interactions||Easy distribution, low barrier to entry||Limited interactivity||Grammar of interactive graphics|
|Jupyter notebook||Daily data exploration||You’re already using it!||Distribution not straightforward||Snippets of HTML, JS, SVG|
|Walled garden||All-in-one visualizations (widgets)||Limited things to learn||What you learn is non-transferable||The walled garden|
|JS notebooks||Literate programs||REPL environment, good docs||Learning curve a bit steep||Core JS plus visualization libraries|
|The web||Web dataviz||You control everything||Floodgates of tech stack, learning curve quite steep||Packaging, components, deployment|
Finally, at the end of this article, I cover JS libraries you might want to learn to do for plotting or numeric work. I’ve built this from experience over the last year experimenting with about a half-dozen interactive visualization methods; of course, if you know better curricula, books or tutorials, please let me know in the comments.
Step 0: Python-based libraries
Your first option is to not code for the web at all by leveraging pure Python-based libraries. Many libraries aim to fill a niche for everyday interactive graphics, which include minimal interactions – hover interactions, tooltips, interactive legends, etc. This includes Plotly, Bokeh w/Holoviews and Altair w/vegalite. On the mapping side, ipyleaflet is worth mentioning.
There has been an explosive growth in the past few years. These solutions fill the same niche as the ubiquitous Shiny fills in the R ecosystem: a way for data science teams to create dashboards. You can nevertheless use them for interactive scientific visualizations. Getting familiar with this kind of visualization will help you build a grammar of interactive visualizations and rapidly prototype more fully interactive visualizations.
Here’s what to expect: a Python script defines a visualization (plots, videos, images, maps, etc.) along with widgets, such as sliders and combo boxes. The whole thing gets served via a server which caches requests intelligently – changes in sliders trigger remote fetches which refresh the visualization. Some custom visualizations may be written in JS. Oftentimes, dashboard solutions are associated with a specific plotting library. so all the docs will be written in terms of that library – but generally you have some flexibility to use other frameworks.
One clarification: if you want to share your visualizations, you’ll need a server. All solutions have a basic open source offering allowing you to DIY deploy such a server yourself via e.g. a cloud provider. If that sounds messy, you can instead opt for a managed cloud solution – there’s often a free tier and paid tiers with more capabilities availables. Here they are:
|Product||Deployment||Peculiarities||Associated plotting library|
|Panel||DIY||Works inside of jupyter and standalone||Bokeh, with support other charting libraries|
|Streamlit||DIY & commercial||Very easy deploys||Agnostic|
|Dash||DIY & commercial||Supports lots of customization, aimed at enterprise users||Plotly, with support for other libraries|
Of these, Streamlit seems to have the most momentum according to Github stars.
See this article for a detailed breakdown.
Step 1: build for jupyter notebook
_repr_html_ and IPython.core.display
When you print a variable at the end of a Jupyter cell, how does Jupyter know how to display it? The answer is that it looks for a
You may also trigger a rich display from anywhere inside a cell, not just at the end, using the ipython API. Again, you can include any number of web technologies, including embedding scripts inside the embedded HTML.
from IPython.core.display import display, HTML display(HTML('<h1>Hello, world!</h1>'))
ipywidgets and interact
ipywidgets, also known as jupyter-widgets or simply widgets, are interactive HTML widgets for Jupyter notebooks and the IPython kernel. Notebooks come alive when interactive widgets are used. Users gain control of their data and can visualize changes in the data. Learning becomes an immersive, fun experience. Researchers can easily see how changing inputs to a model impact the results.The ipywidget manual
Next in line in complexity are
ipywidgets can display rich HTML; however, widgets can also have events which cause them to call Python. See this notebook for an example. You can use them, for instance, to navigate through a plot with widgets. In fact, this pattern is so common that there’s a convenience function called
interact which allows you to do just slider-based interactions.
Building on jupyter widgets, you can release fully fledged dashboards based in jupyter notebooks. This is called voila. The idea is that you write a regular jupyter notebook, and then you can serve it through a tornado server on the web. The end user sees a dashboard which is the concatenation of the output cells of the dashboard, without the code. You can deploy this on Heroku or Google App Engine to make your application world accessible.
Step 2: Walled gardens
Unity is a game development engine and IDE. It’s a great environment to build offline games and VR visualizations. It transpiles exported scenes and C#-based scripting to WebAssembly (WASM), so it also runs in the browser. C# is a modern language with operator overloading. 2d and 3d interaction are easily accommodated. Cons: widgets (sliders, input boxes, etc.) must be created in the Unity interface and are non-native. The Unity applet acts as a walled garden, making it difficult to interact with the native DOM. The C# code cannot be compiled with a command line tool – it must be combined with the project and compiled in the Unity editor, and the project is several GBs, hence can’t be tracked in Github. Furthermore, the math libraries in C# are pretty limited. Overall, a pretty limited option for the web, though it has considerable appeal if you’re doing 3d viz.
p5.js and openprocessing
For many, I suspect this will be the right level of complexity. In practice, using notebooks mean you will need to export your data out of Python in order to visualize it – for many use cases, json will do the trick.
Step 4: the web
One very annoying gotcha you’ll run into is packaging differences between node and JS built for the web. A lot of packages you’ll use for data analysis in JS – like mathjs – were originally built for the node server environment. You need to build them to use them, which means learning about build tools, and pretty soon you’ve wasted a couple of days figuring out your transpiler. There are now cloud transpilers, in particular skypack and JSPM, that will compile node JS libraries for you so you can use them immediately without using a complex toolchain. That means you can import node packages in JS much in the same way you would import pip installable packages in Python:
import * as mathjs from 'https://firstname.lastname@example.org'; console.log(mathjs.add(1,2));
Thanks to Guido Zuidhof for patiently explaining this to me. For deployment, provided you don’t rely on external services, you could use github pages, or use a CDN service like netlify. If you need to interact with data, then you may need to both create and deploy a data source via a REST API. Something lightweight like flask deployed on heroku would do the trick.
As you’re going through this process, it will often feel like the garden of forking paths: seemingly infinite decisions to make. However, there’s a small core set of JS libraries you might interact with:
- d3: data driven documents in JS, https://d3js.org/
- mathjs: math, vectors, matrices in JS: https://mathjs.org/
- stdlib: a standard library for JS, including distributions: https://github.com/stdlib-js/stdlib
- Vega lite: declarative plots in JS: https://vega.github.io/vega-lite/
- Tensorflow.js: tensor computation library in the browser: https://www.tensorflow.org/js
- Three.js: 3d visualizations for the web: https://threejs.org/
- Arquero: like dplyr, for the web: https://github.com/uwdata/arquero
I think this is enough of a curriculum to carry you from 0 to 1 over a 6 month period. Practice, learn, and in no time you’ll be ready to make solid visualizations!
- Iodide blog post (Iodide is deprecated, but the background and resources linked are still relevant)
- Js4DS – comprehensive intro to JS for data science (long! doesn’t talk about d3)
- Recent Pyodide overview on HackerNews
- Recent observable thread on HN
- Distill (for inspiration)
- JS and the next decade of data programming
- Observable top notebooks
- The modern JS tutorial
- Learn JS data
- Full stack D3 (a paid, structured 8 week course)
- PyViz – comprehensive reference for Python visualization packages