Boosting in Matlab: a package

I just posted a package to do boosting in generalized linear and additive models (GLM and GAM) on Matlab Central. Boosting is method for fitting GAMs, which are models composed of nonlinear functions (“learners” or “smoothers”) added together to form an internal response which is then transduced into an observed response by a nonlinearity followed by a (non-Gaussian) stochastic process. GLMs can be viewed as a special subcase of GAMs where the learners are linear. GAMs have special properties that make them easy to fit compared to more general models, yet are extremely powerful. For example GAMs are a useful framework in describing the relationship between a stimulus and a neuron’s firing rate, where the learners are some function of the stimulus, added together, sent through a rectifying nonlinearity to drive a Poisson process.

Boosting is a method to fit GAMs that starts with an empty model and gradually builds it up by adding iteratively adding “the most useful” learner out of a set of learners. The reasoning behind this is the following. Call your “internal response” eta_n (n is an index, the n’th observation). Call your observation vector y_n, and the likelihood of data given an internal response L(y_n,eta_n). Then the derivative dL/d(eta_n) is a perfectly legitimate gradient of the likelihood. Now you can compute the value of all the learners and determine which one has whose normalized projection with with this gradient is greatest. Then by going from eta to eta + alpha*(bestlearner) what you’re doing is a form of functional gradient ascent. Boosting in GLMs is also closely related to L1-regularized (sparse) GLMs.

It’s trivial to write a boosting implementation in Matlab, but usually it will be limited to fitting one particular GLM (say the logistic regression GLM) or one particular type of learner. My goal here was to make an implementation where it was easy to add new learners and link/distributions. So I planned the thing in UML and implemented it as a full-blown set of classes. I am pleasantly surprised by the evolution of object-oriented programming in Matlab. With classdef and the new handle parent class you can make full-blown classes that follow pass-by-ref semantics. It supports packages, abstract classes, multiple inheritance, overloading operators, pretty much any feature you can think of  (except for strong typing). A really awesome feature is that Matlab parses comments and can auto-generate documentation (doc YourClass), javadoc-style. Too bad nobody knows about this.

Anyways, enjoy and upvote.

Leave a comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s