Building Neuromatch Academy

Neuromatch Academy 2020 is a three-week online summer school in computational neuroscience that took place in July of 2020. We created interactive notebooks covering all aspects of computational neuroscience – from signals and models of spikes to machine learning and behaviour. We hired close to 200 TAs to teach this material to 1,700 students from all over the world in an inverted classroom setting. Over 87% of students that started the class finished it, a number unheard of in the MOOC space.

By now, we have a preprint out, and NMA has been covered in Nature and the Lancet. So how did we do it? Building something that goes from 0 to several hundred volunteers working tirelessly is a huge endeavor, and I’m sure a lot of the readers of this blog coming from an academic background have trouble even imagining how you can get something like that off the ground. Richard Gao wrote a great post on how it felt to be flying in this anarchist hacker spaceship as it was being built. I wanted to share some thoughts with you that I hope will be helpful if you ever want to create an experience at this scale.

A little background: my stint into teaching

After my time at Facebook, I wanted to focus on the academic route. I became interested in teaching and taught an introductory CS class at a local college, building my materials from scratch. I thought, rather naively, that since I knew CS, this would be fairly straightforward. But preparing course materials materials for beginners is anything but easy. One issue is the curse of expertise: if you know a subject pretty well it becomes harder to explain it to a novice. For example, here’s my explanation of globals in Python:

Python has two scopes: module and function. You have read access to module variables inside of a function. However, if you want to write to a module variable it, you need to use the global keyword. Unless the variable is a reference type… If a variable is a dict or a list, for example, you can change its contents inside a function even if it’s a module variable. That’s because Python has pass-by-ref semantics for complex types. You see, default pass-by-ref semantics for complex types were originally introduced in fourth generation languages to get rid of the difficulty of granular pass-by-value or pass-by-ref semantics in languages like C. […many rambling minutes later…]. In any case, don’t use globals if you can avoid them.

Although this explanation is technically correct, it doesn’t fit with novice students’ current development, and giving that explanation is more likely to make them feel inadequate than enlightened. It takes a tremendous amount of attention to detail, radical empathy and craftsmanship to understand the adult learner’s motivations – both intrinsic and extrinsic – and to create materials that engage them at the right level. And creating this stuff takes a huge amount of time! By the time I finished the class, I was ready to throw away the material I created. At the same time, it felt quixotic to take half a year off to craft the perfect materials for a class that was going to be taken by ~15 students a semester. The lack of good materials meant that I had to spend much of my time in class on adjusting materials dynamically – at the end of the day, I was completely exhausted. Surely, there must be a better way!

In response to the global pandemic, Chris Piech and Mehran Sahami decided to offer the first half of the introductory Stanford CS class to students at large, an experiment called Code in Place. Over 800 section leaders signed up to give 6 weeks of instruction on core computational concepts in Python. Here was a completely different approach to teaching a subject: very high quality materials crafted over years at Stanford – including a robot-control environment in Python called Karel – given in a hybrid setting, with the section leader (aka TA) holding Q&A sessions and solving test problems with students in an inverted classroom setting. And it worked! Students were engaged and stuck around, and they worked on self-directed projects to continue their learning. What I found is that by not having to focus so much on materials prep and the fallout from having so-so materials, I had more time to connect with the students on an emotional, social level – understanding what they’re stuck on, where they’re coming from, what their motivations are, etc.

Karel is an environment where students can learn basis of programming, including variables, if statements, for loops and decomposition, by controlling a cute robot on a grid called Karel

A few weeks into Code In Place, I came across Konrad Kording’s Twitter post looking for NMA volunteers. By that time, they had a core team – Megan in the Exec role, Sean as finance guru, Brad as ops, Konrad as dreamer, Gunnar and Paul on curriculum. They had just gotten started, they needed people with technical skills. I took on the technical chair, things escalated, and within a few weeks I was CTO and along with many other, was doing NMA full time. My time in industry proved to be an asset – working with short deadlines, focusing on shipping, and managing relationships with a large interdisciplinary team. Pretty soon, I was helping coordinate dozens of volunteers, a growth experience that I didn’t know I was ready for. So if you’re wondering whether industry is right for you, if you want to build big things and manage big collaborations in the future, I think it’s a great place to get that experience.

Deciding what to build

When you first get started on a large project like this, you get overwhelmed with ideas about what it is that you’re trying to build. One framework for orienting your thoughts are the Heilmeier questions:

What are you trying to do? Articulate your objectives using absolutely no jargon.
How is it done today, and what are the limits of current practice?
What is new in your approach and why do you think it will be successful?
Who cares? If you are successful, what difference will it make?
What are the risks?
How much will it cost?
How long will it take?
What are the mid-term and final “exams” to check for success?

If you can answer these questions, you know what you’re building, and you can get a team to congregate around those ideas. This isn’t very different, conceptually, from writing a grant. For us, it was really the fundraising document that formalized a lot of our early brainstorming The answer to these questions could be written simply as:

We’re building an online, 3-week intensive summer school in computational neuroscience. It will be cheap (100$ or less to attend) and accessible to all.
Traditional summer schools offer a great experience and the chance to create life-long scientific networks, but they are elitist and expensive. MOOCs are cheap and accessible to all but a vanishingly small percentage of people actually complete them because they don’t foster belonging.
We create high-quality videos and interactive notebooks and we deliver them in pods of students matched to TAs. We offer the close-knit experience of traditional summer schools with the accessibility of MOOCs in a new hybrid model. People are bored and alone because of COVID so we have a unique opportunity to show that this online inverted classroom model can work.
We’ll bring cheap high-quality education to thousands of people and will set the standard for open source educational materials for years to come. We can bring the model to many other areas of science.
Execution risk and legal risk
1500$ for each TA, and we’re aiming for 200 TAs
We’re launching July 13th
We’ll track attendance, completion metrics, students surveys, website statistics, etc. to have a 360 view of our impact. Our core metric is completion of the class and we aim for 80% completion.

Within each committee, you can take Heilmeier’s questions and consider their relevance to your committee’s duties. The curriculum committee’s goal might be to create high-quality materials for the class, while the technical committee might aim to deliver 99% uptime during the class. The point is to try to have clarity on your goals so that when crunch time happens you can prioritize.

Running on enthusiasm

I was really enthusiastic about taking what I learned from Code in Place and bringing that to computational neuroscience. The core team had been thinking about a new kind of summer school for years before NMA got started and they were ready to tap into their network to jump start the effort. In the early days, being enthusiastic about the project and having a great story to tell are instrumental in finding volunteers. We’re democratizing education! We’re bringing grad-level education to underserved communities! We’re doing something no one’s ever done!

I volunteer at a soup kitchen, and one thing they emphasized during our training is that a volunteer’s time is a gift. You have to have a pipeline that translates a volunteer’s enthusiasm into action, otherwise you’re refusing to accept that gift, and that’s frustrating for the volunteer. At first, we would bring new people into our slack workspace and expected them to find their way. We realized pretty soon that having people in the slack is just the first step – you need to have a plan to retain volunteers. This became especially true as the slack workspace shot up to hundreds, sometimes thousands of messages a day. Empathize with those poor people just joining in being bombarded with way too much information!

For the technical committee, what worked well to engage new volunteers is to do a task triage. There’s a backlog of stuff that needs to be done – update the website, crank out some forms, clean up the github page, prepare the forum, etc. In a synchronous meeting, you go through the backlog and people volunteer to take on different tasks. That means your committee head needs to have tasks partly groomed beforehand. They pull up the Kanban board during the meeting and fill it in – we used quire, but Asana or even just a Google doc could work equally well. Everybody gets a chance to pitch in, and confusion about the scope of each task can be resolved during the meeting. It’s also a big team-building exercise: you get to see other people pitch in. We applied this method to build several efforts, including bootstrapping the technical team, the video editing team, the waxers team (see later), community managers, etc.

If you prefer the asynchronous route, Github tasks with a good-for-first-timers tag can be useful. Regardless of the software you use, you want to embody openness: you want tasks to be done to be radically transparent and for volunteers to feel maximal agency. This is a common model for open source communities that can be fruitfully adapted to education.

Running like clockwork

Assigning tasks of different sizes and priority as a backlog grooming session is great to engage new volunteers, but you will also need to do time-sensitive tasks. For time-sensitive things, you need a Gantt chart with a detailed day-by-day breakdown of tasks to be done with somebody accountable for each task (the single-threaded owner). I was taught the arcane art of Gantt charts by my colleagues Keith and Frances at Facebook, alums of Fitbit and the Apple hardware team. Let me tell you something, if you want to get stuff done on time, somebody that knows how to ship hardware will make it happen – when you build consumer products, you have to ship them on time, no matter what.

The arcane art of the Gantt chart

For an online summer school, that means having deadlines for student admissions, TA applications, first drafts of content, second drafts of content, reviews, editing, posting on github, etc. for every single day of content. The deadlines should be visible to everybody, and the people responsible for them should be accountable. Elizabeth DuPre pointed me to this talk by the creator of Elm – in open source projects, the right culture doesn’t just happen by accident, culture happens because norms are defined and reinforced. “Deadlines matter” is a norm that is different than what most people are used to in the context of giving instruction. If you’re giving your own class, you can keep editing your slides up to 2 minutes before the class. You can’t do that at NMA – your slides will be reviewed, your video recording will be edited, your video will be captioned, etc. – if you hand in your slides late, everybody down the pipeline suffers.

If you want deadlines to matter you have to will them into existence by making them a norm inside of an organization and reinforcing that by making them visible to everybody. Keeping track of these deadlines eventually led to a daily cadence of standups, which I ran – ironic because I always hated standups in industry. Konrad once called me “a little German” for my insistence on deadlines, which is very funny since I’m almost absurdly disorganized in my personal life. Being inside of a big org is different. It’s an embodiment of a deeper principle of contractualism – we define and agree on what we owe to each other and bind ourselves to that, and that’s what defines morality within our community. The upside of that is the contract is multilateral. When we couldn’t get the ideal matching method between TAs and students to work on time, but we had a “good enough” version that was working, I could say to others that we have to ship the good enough version today, because we agreed to it; it gave me cover, even though it was an unpopular decision. That kind of clarity in planning and expectations avoids a lot of friction later on.

You need good tools to work productively with each other

In The Mythical Man-Month, Fred Brooks claims that “adding [more programmers] to a late software project makes it later”. When you have dozens of contributors to an open-source education curriculum, how do you avoid that? You need good tools. One of the first things that the technical team worked on was a smoke test for notebooks. We were worried that one of the notebooks that we use for teaching might break. Notebooks are designed for exploration and can be pretty brittle – a simple cell execution order inconsistency can break a notebook. If multiple people are editing and pushing notebooks, it’s almost guaranteed that a notebook will break at some point. If a notebook is broken, an individual TA might not have enough context to diagnose the problem, and we would have to broadcast to our 200 TAs a fix, which would be stressful for TA and student alike.

So we started with a really simple smoke test, written by Marco. When you push notebooks to Github, the continuous integration kicks in, runs your notebook from scratch, and check whether it runs. From these humble beginnings, Michael Waskom built an intricate continuous integration (CI) pipeline to check that the code runs, that the style is consistent, and to generate versions of the notebooks for students and TAs, etc. This is the thing that allowed the editors to push better versions of the notebooks on a regular basis, and proved invaluable during a time crunch.

Similarly, we used an online video editing tool (WeVideo) rather than an offline one because it allowed multiple people to contribute to video editing. That allowed Tara to oversee a dozen video editors and reassign tasks when necessary.

You need good organization to work productively with each

The easiest way to get a lot of people to contribute lots of content is to have them work in silos. Each of our instruction days was more or less independent of the others, so it made sense to have different people work on each independently. The downside to this approach is that it makes for a jumbled experience for the student. Novice learners, who don’t yet have proficiency in a subject, can easily be thrown off by a change in notation, nomenclature, or tone. Day leaders created one pagers far ahead of the content so we could diagnose missing prerequisites, incompatible learning objectives and big bumps in difficulty throughout the days. But that wasn’t enough.

The biggest contributors to a smooth experience for the students were pre-pods and waxers. When I was giving an introductory CS class, I could see the looks of confusion on student’s faces. I could adjust the content in realtime (a high-wire experience, do not recommend). Ideally I would have had another test classroom run through the content in real time to give me detailed notes so I could improve the content before I gave the class in front of the actual students. That’s exactly what we did with pre-pods: we hired a dozen TAs to test the content 3 weeks before the real class (pods before the real ones; hence pre-pods). They gave 360 feedback at the micro level (e.g. typos) and macro level (e.g. changes in difficulty from day to day) which could then be taken into account by the content creators. It’s not adversarial like peer review can sometimes feel – everybody is on the same side. It’s similar to UI/UX testing, where you put your product in front of naive users and watch them destroy it from the other side of the one-way mirror so you can make a better product. This kind of design thinking – ship early, ship often, iterate – can be applied to all aspects of open source education.

The other thing we realized is that writing good notebooks is really hard: it requires the confluence understanding the content (domain knowledge in comp neuro), programming, and radical empathy towards the learner (caring about androgogy). You also need an understanding of the house style, context across days, and knowing the quirks of the Github CI. We needed a dedicated team to polish content, the waxers (we call them editors outside of NMA, but internally the name stuck). Waxers had to have the most stressful job of all, and I would often see messages exchanged on slack at 3-4 in the morning about content that needed to be ready for the next day. But it worked! The notebooks were a highlight of the NMA experience, and they will keep giving value to the community for a long time.

Ella, Michael, Tara, Madineh and I presented our pipeline for the content in this talk:

Making decisions under uncertainty

How do you make decisions swiftly and effectively in a big org? We had hundreds of volunteers; 134 authors on the curriculum paper alone. 134 very smart people cannot all be of the same opinion at the same time in the face of uncertainty. The first step to make decisions effectively is to separate decision making and execution. It’s very easy to leave a lot of decisions to be implicit until it’s crunch time. It happened a couple of times that I made a medium-scale decision or encouraged other people to make medium-scale decisions during a daily standup meeting. Big mistake. A lot of people affected by the decisions weren’t in the room; they felt unheard and now burdened to go to a daily meeting to stay in the loop. Don’t do that – take bigger decisions in separate planning meetings. Every software methodology has some notion of a planning meeting, whether waterfall or scrum or whatever – by the end of that meeting you should be clear about what to do for the next week or month.

One thing that can sap a lot of energy at these meetings is having circular arguments: revisiting the same question in subsequent meetings even though you thought you had come to an agreement. Sometimes, one person thinks a decision was made, while another thinks things were still under discussion. Write down decisions in meeting notes, share them with everyone.

Some orgs in the for-profit sector are hierarchical, so that disagreements get resolved top-down. That doesn’t really work in the non-profit sector, where people are there on a volunteer basis. In non-profit and in open source communities, many orgs thus use consensus-based decision making. I think pure consensus-based decision making is very difficult to get right. I’m a member of a not-for-profit makerspace that’s built on anarchist principles. We use consensus-based decision-making, and it gets rowdy: flame wars on our message board are not infrequent, whether it’s about resource allocation and who gets the space when and whose responsibility it is to take the trash out. Pure consensus-based decision making can have an insidious effect that people that are not in agreement with the majority are left isolated as “difficult” and “not a team player”. That can breed resentment, which leads to personality clashes, which leads to dysfunction: decision making grinds to a halt.

There’s an alternative to pure consensus-based decision making: disagree and commit. You replace the norm that consensus must be reached with an alternative norm:

everybody’s opinion should be understood
decisions are made based on majority or supermajority
it’s ok to disagree with a decision but we all bound ourselves to the decision

To make sure that you understand somebody’s opinion – especially an opinion you don’t agree with – you can use the Steel Man technique. You restate the strongest version of their argument and engage with that. Oftentimes that will resolve whatever argument you had in the first place, or prepare you to build a hybrid solution. If you don’t come to a resolution, at least everyone will understand that they have been heard and understood. Decisions can take place in realtime in a synchronous meeting, or through a polling app in Slack (if a poll, put a deadline on it so it doesn’t drag on; in either case, quorum should be reached). If somebody disagreed with the decision, the norm says that that’s good and healthy, as long as they commit to the decision like everyone, so it doesn’t breed resentment in the long term.

Be the change you wish to see

My manager at Facebook often talked about positive risk: if you shoot for the moon, sometimes you can actually overshoot and accomplish more than you intended. That always sounded like nonsense to me – I’m a firm believer in Murphy’s law – but I had a chance to witness things going unexpectedly right at NMA. We planned many things, and delivered on what we planned, but perhaps our greatest success was fairly unexpected: pods worked. When we put students together with TAs, and had them interact closely for hours at a time in peer programming, they created bonds. Those bonds tapped into the students’ intrinsic motivation to be part of a community, and they felt belonging. When the going got tough, and they felt overwhelmed, they tapped into that support network to keep them going one more day. That’s how 87% of students managed to finish the class. One TA described the students tearing up on the last day and staying up till late in the morning to say goodbyes. Yes, materials are important, but in androgogy you also need to answer students’ emotional needs. That’s the biggest differentiator between NMA and a MOOC.

Building that experience required a ton of time from many dedicated and passionate people, people that wanted to make a difference. Sometimes emotions ran high and I butted heads – I can be a difficult person and that’s something I need to work on. Sometimes we had legitimate disagreements, but oftentimes we were just stressed and sleep-deprived, and we were able to apologize and move on. We were able to deliver a great experience, produce a new model for online learning, and we created bonds with each other that will follow us for a long time. Should you decide to embark on a expansive adventure like that? In the words of Edwin Land (stolen from Jack Gallant’s email signature):

Don’t undertake a project unless it is manifestly important and nearly impossible
Edwin Land

I’d like to thank the organizers, Megan, Brad, Gunnar, Konrad, Paul, Kate, Carsen, Xaq, Aina, Athena, Eric, John, Alex, Yiota, Emma and Beth; the waxers Michael, Madineh, Ella, Matt, Richard, Jesse, Byron, Saeed, Ameet, Spiros; everybody that contributed to technical, Titipat, Jeff, Marco, Arash, Adam, Natalie, Guga, and Tara; and everyone who contributed, whether a few hours or weeks at a time.

Do you want to join a motley crew of education disruptors? Volunteer for NMA 2021 – calls are broadcast on Twitter.