### Using an L1 penalty with an arbitrary error function

The L1 penalty, which corresponds to a Laplacian prior, encourages model parameters to be sparse. There’s plenty of solvers for the L1 penalized least-squares problem, $\arg \min _\mathbf{w} ||\mathbf{y}-\mathbf{Xw}||_2 + \lambda ||\mathbf{w}||_1$. It’s harder to find methods for other error functions than the sum-of-squares. L1General by Mark Schmidt solves just such a problem. There’s more than a dozen different algorithms implemented in Matlab, so you should be able to find something that works well for your problem. I’m having good luck with L1General2_TMP in a sparse convolutional neural net embedded in a GLM scaffold.

If you’re interested in the more specialized problem of fitting a vanilla GLM with a sparseness penalty, you’re in luck, there’s plenty of specialized packages for that. If you have a very recent version of Matlab, then you can use the statistics toolbox function lassoglm. Then there’s glmnet, which is based on (I think) coordinate descent. Then there’s the package I published as part of Mineault et al. (2009) you might like.