Working with 2d data defined on a grid is pretty straightforward in Matlab; you can easily visualize the data with imagesc and smooth it through convolution. Working with data sampled at non-uniform intervals is much trickier. Doing something as simple as visualizing it becomes a pain.
One way of getting around this is to resample the data on a uniform grid. This works well if your data is densely sampled. If not, then you’ll have to fit the data to a nonparametric surface. The gridfit function available on Matlab Central does this. The technical details are sketchy, but from what I gather the bilinear option basically approximates the surface with a tensor product of linear B-splines. The corresponding regression is penalized with a smoothness penalty whose strength is determined by the end user. It works pretty well, but I don’t really like the way it handles edges.
I searched for more Matlab-based tools that rely on tensor products of splines but I couldn’t find anything very promising. Mathworks has a documentation page on tensor products of splines but the syntax is very low-level and far from intuitive. The curve and surface-fitting toolbox doesn’t include a ready made method of fitting surfaces with splines; the only available smoothing option is Lowess.
The situation is much better in R, where Simon Wood maintains the mgcv package for GAMs that can fit surfaces to data via Generalized Additive Models. He has a fantastic book on this subject (which also has an excellent intro to GLMs, btw).
A worthy alternative to spline-based smoothing of surfaces is Gaussian Process regression. Indeed, kriging, which is mathematically equivalent to Gaussian Process regression, was used for the smoothing and interpolation of 2d and 3d geostatistical data much before the popularity of Gaussian processes in machine learning circles. You can use the GPML package to fit a Gaussian process to your data. The syntax is geared towards machine learning types rather than day-to-day 2d data smoothing, but the documentation is extensive. You can automatically determine the optimal smoothing strength with Gaussian processes. You might find geostatistical packages, such as this one, more user-friendly, although I have yet to try them.
The top figure shows some data I simulated that is shaped like a Gabor. The data is quite noisy and irregularly sampled. The Gaussian process reconstruction is quite nice. One especially useful aspect of Gaussian processes is that you get uncertainty estimates for free. Here you see that the uncertainty rises far from the data points. The gridfit reconstruction is not bad, but the edges are handled less naturally.
The data was generated by:
%Simulate a Gabor nobs = 100; xy = randn(nobs,2)*3; obs = exp(-(xy(:,1).^2+xy(:,2).^2)/2/1^2).*sin(-(xy(:,2)))+randn(nobs,1)*.1; scatter(xy(:,1),xy(:,2),50,obs,'filled')
Fitting with a Gaussian process, assuming the GPML toolbox is on your path, is done like so:
%2D version %This is to handle the case where data has non-zero mean and slope meanfunc = {@meanSum,{@meanLinear, @meanConst}}; %Squared exponential covariance with ARD for smoothing-effect similar to %that of convolution with a Gaussian covfunc = {@covSEard}; %Gaussian likelihood for normal errors likfunc = @likGauss; %Initialize the hyperparameters like they say in the GPML doc ell = 1; sf = 1; hyp2.cov = log([ell,ell,sf]); hyp2.lik = log(0.1); hyp2.mean = [0;0;0]; %Optimize the hyperparameters to get the best smoothing hyp2 = minimize(hyp2, @gp, -100, @infExact, meanfunc, covfunc, likfunc, xy,obs); %test meshgrid [xi,yi] = meshgrid(-10:.05:10,-10:.05:10); X2 = [xi(:),yi(:)]; %Perform inference and get the results on new datapoints [m,s2] = gp(hyp2,@infExact,meanfunc,covfunc,likfunc,xy,obs,X2); %reshape(m,size(xi)) gives you the surface %reshape(sqrt(s2),size(xi)) gives you the uncertainty estimates
Finally, gridfit is used like so:
zi = gridfit(xy(:,1),xy(:,2),obs,xi(1,:)',yi(:,1),'interp','bilinear','smoothness',50);
One response to “Smoothing a non-uniformly sampled surface”
This may not strictly be related to you post…but,
What is the best way to ensure finding a good minima using the minimize function by the selection of initial hyperparameters? Often for long compound covariance functions with many hypers, I find that this is rather difficult. Thanks!