Estimating a statistical model via maximum likelihood or MAP involves minimizing an error function – the negative log-likelihood or log-posterior. Generic functions built in to Matlab like fminunc and fmincon will often do the trick. There are many other free solvers available, which are often faster, or more powerful:

Solvers by Mark Schmidt: there’s a huge collection of functions from Mark Schmidt to solve generic constrained and unconstrained problems as well as solvers for more specific problems, e.g. L1-regularized problems. minFunc and minConf are drop-in replacements for fminunc and fmincon, and they are often much faster. Each function has several variants, so you can fiddle with the parameters until you get something that has acceptable speed.

CVX: the toolbox for convex optimization is not always the fastest, but it’s exceedingly powerful for constrained optimization. You use a declarative syntax to specify the problem and CVX takes care of finding a way to solve the problem using a number of backends, including SDPT3 and SeDuMi. See also the excellent companion book. YALMIP is one alternative with similar design goals.

L1-penalized solvers ad-nauseam: there are so many solvers available for the L1-regularized least-squares problem – aka the LASSO – it’s getting out of hand. YALL1 is notable for solving many variants of the problem: constrained, unconstrained, positive, group-Lasso, etc. Mark Schmidt‘s L1General is notable for being able to solve the general L1-penalized problem – meaning the error function is something other than the sum of squares. L1-penalized GLMs can be solved directly by L1General without resorting to IRLS.

Non-negative least-squares: Matlab’s lsqnonneg is slow. It’s much faster to solve the normal equations – work with the sufficient statistics X’X and X’y of the underlying generative model – when there are more observations than variables. Here’s one function that does the trick. When the design matrix X is exceedingly wide or sparse, on the other hand, this toolbox is very useful. You can use NNLS as a building block for non-negative GLMs via IRLS.

If your problem can be written naturally as a sum over minibatches, then you should also check out SFO, which has both MATLAB and Python implementations:

SFO combines benefits from stochastic gradient descent and LBFGS, and is typically faster than either of them. Most importantly for your sanity, you don’t need to tune hyperparameters (except possibly the number of minibatches).

This is self-promotion of my own research though — so rather than taking my word for it you should try it :).

Reblogged this on Qamar-ud-Din.

If your problem can be written naturally as a sum over minibatches, then you should also check out SFO, which has both MATLAB and Python implementations:

https://github.com/Sohl-Dickstein/Sum-of-Functions-Optimizer

SFO combines benefits from stochastic gradient descent and LBFGS, and is typically faster than either of them. Most importantly for your sanity, you don’t need to tune hyperparameters (except possibly the number of minibatches).

This is self-promotion of my own research though — so rather than taking my word for it you should try it :).

Very interesting. I’ll read the paper.