Log determinants frequently need to be computed as part of Bayesian inference in models with Gaussian priors or likelihoods. They can be quite troublesome to compute; done through the Cholesky decomposition, it’s an operation. It’s pretty much infeasible to compute the exact log det for arbitrary matrices > 10,000 x 10,000. Models with that many parameters frequently occur in non-parametric regression methods like Gaussian processes.
Zhang and Leithead (2007) offer an update to the power series expansion Monte Carlo method of Barry & Pace (1999). The old paper expanded log det(A) into a power series. All the terms in the expansion involve matrix traces; instead of computing the traces explicitly, they are approximated through a Monte Carlo method, which involves taking random matrix vector products. Zhang and Leithead expand upon the method by decreasing the number of iid Gaussian vectors needed to 1, not requiring any restrictions on the eigenvalues of the matrix A, and using extrapolation so the power series can be truncated earlier. They claim their method is , which is potentially much better than the Cholesky version for large N.
There’s Matlab code for the method in the appendix of the paper. Unfortunately, it’s mangled in the pdf and it’s missing a required function. For convenience, here is a cleaned up version:
function logdet=mcLogDet(C,pp,kk,beta) % input arguments: covariance matrix C, % (optional) the number of trace seeds selection pp, % (optional) the number of power-series terms kk, % (optional) the ensemble adjusting factor beta. if nargin < 2 pp = 20; end if nargin < 3 kk = 50; end if nargin < 4 beta = 1; end N = size(C,1); Cn=norm(C,inf); A=C/Cn; if issparse(C) B=speye(N)-A; else B=eye(N)-A; end clear C A; traceBt=trace(B); traceB2t=traceAxB(B,B); % a simple routine to get trace(B 2 ) with N 2 operations deltaB122=inf; for iii=1:pp % trace seeds selection for pp times xxt=randn(N,1); cct=B*xxt; xxBNxx=xxt'*cct; xxTxxt=xxt'*xxt; traceBe=N*xxBNxx/xxTxxt; deltaBt=traceBt-traceBe; cct=B*cct; xxBNxx=xxt'*cct; traceB2ee=N*xxBNxx/xxTxxt; deltaB2e=traceB2t-traceB2ee; deltaB122new=deltaBt+deltaB2e/2; if abs(deltaB122new)<=abs(deltaB122), % have found a better seed xx=xxt; cc=cct; xxTxx=xxTxxt; % save the workspace related to the new seed traceB1e=traceBe; traceB2e=traceB2ee; deltaB=deltaBt; deltaB2=deltaB2e; deltaB122=deltaB122new; end end clear xxt cct; rhotr=deltaB2/deltaB; traceB=traceB1e; traceB2=traceB2e; if abs(rhotr)>=1, % disp(‘trace rho greater than or equal to 1?'); compens2=deltaB/(1-rhotr/2); % a more robust yet loose trace-seed compensation asin Remark 3 elseif rhotr==0, compens2=0; % no need compensation as i = 0 else compens2=-deltaB*log(1-rhotr)/rhotr; % the standard trace-seed compensation as in Remark 3 end; vv=traceB+traceB2/2; preadd=traceB2; % save the computation of the first two power-series terms compens1=0; wthla1=0; wthla2=0; wthla3=0; for jjj=3:kk, % start the computation from the third power-series term to the kth term cc=B*cc; xxBNxx=xx'*cc; newadd=N*xxBNxx/xxTxx; if preadd==0, % only for numerical-robustness consideration compens1=0; wthl=vv-compens1-compens2; logdet2=N*log(Cn); logdet=-wthl+logdet2; return end lambda=newadd/preadd; if lambda<1, sumlambdaik=0; lambdai=1; ndt5=0; for iii=1:jjj, % computing the second term of the truncation-error compensation as in Proposition 5 lambdai=lambdai*lambda; ndt5=ndt5+lambdai/(jjj+iii); sumlambdaik=sumlambdaik+lambdai/iii; end if lambdai==0, compens1=-newadd*ndt5; % truncation-error compensation via (8) else compens1=newadd*(log(1-lambda)+sumlambdaik)/lambdai; % the final truncation-error compensation as in Proposition 5 end end; preadd=newadd; newadd=newadd/jjj; vv=newadd+vv; wthl=vv-compens1-compens2; % for the finite sequence with the ensemble effect as in Proposition 6 wthla1=wthla2; wthla2=wthla3; wthla3=wthl; % save the last three terms of such a sequence, i.e, k-2 , k-1 and k end; if lambda>=1, % disp(‘final lambda greater than or equal to 1?'); compens1=0; % only for numerical-robustness consideration end aaa=wthla1; bbb=wthla2-wthla1; qqq=(wthla3-wthla2)/bbb; if bbb==0, % disp(‘The last three terms of Gamma-sequence are equal: converge well'); else if abs(qqq)>= 1, % only for numerical-robustness consideration % disp(‘Gamma-sequence coefficient greater than or equal to 1?'); else wthl=aaa+bbb*qqq/(1-qqq); % Geometric series based re-estimation as in Proposition 6 end end logdet2=N*log(Cn); logdet=-wthl+logdet2; if logdet/max(abs([wthl,logdet2]))<1e-2 wthl=wthl*1; elseif bbb/aaa>1e-4 % use beta to adjust the final ensemble effect if necessary wthl=wthl*beta; % in general, beta:=1 end logdet=-wthl+logdet2; % return the final logdet approximation clear B; % the routine is thus complete end function r = traceAxB(A,B) r = sum(sum(A'.*B,1),2); end
And some sample code that shows how to use it:
thestd = 0.2; A = eye(1000); A = A + randn(1000)*thestd; A = A'*A; L = chol(A); thelogdet = 2*sum(log(diag(L))) thelogdeta = mcLogDet(A,20,50,1)
In the paper, the authors used (10,30,1) as the parameters of the method by I prefer (20,50,1) for safety. Mind you, the method is not super precise, they say about 10%, but for model comparison it’s more than precise enough.
One response to “Approximate log determinant of huge matrices”
I wonder what you mean when you say “… but for model comparison it’s more than precise enough.” Because an absolute difference of 1.92 (half of 95% Chi-square quantile with 1 df) could be significant. That is if (2 x log-likelihood)’s differ by 3.84 then that’s significant assuming the d.f. = 1.
I did an experiment and this stochastic estimate of the log-det was off by 22 from the exact value (found by Cholesky). The relative error was like 0.06% granted but you know that doesn’t help.