# Approximate log determinant of huge matrices

Log determinants frequently need to be computed as part of Bayesian inference in models with Gaussian priors or likelihoods. They can be quite troublesome to compute; done through the Cholesky decomposition, it’s an $\mathcal{O}(N^3)$ operation. It’s pretty much infeasible to compute the exact log det for arbitrary matrices > 10,000 x 10,000. Models with that many parameters frequently occur in non-parametric regression methods like Gaussian processes.

Zhang and Leithead (2007) offer an update to the power series expansion Monte Carlo method of Barry & Pace (1999). The old paper expanded log det(A) into a power series. All the terms in the expansion involve matrix traces; instead of computing the traces explicitly, they are approximated through a Monte Carlo method, which involves taking random matrix vector products. Zhang and Leithead expand upon the method by decreasing the number of iid Gaussian vectors needed to 1, not requiring any restrictions on the eigenvalues of the matrix A, and using extrapolation so the power series can be truncated earlier. They claim their method is $\mathcal{O}(50N^2)$, which is potentially much better than the Cholesky version for large N.

There’s Matlab code for the method in the appendix of the paper. Unfortunately, it’s mangled in the pdf and it’s missing a required function. For convenience, here is a cleaned up version:

function logdet=mcLogDet(C,pp,kk,beta)
% input arguments: covariance matrix C,
% (optional) the number of trace seeds selection pp,
% (optional) the number of power-series terms kk,
% (optional) the ensemble adjusting factor beta.
if nargin < 2
pp = 20;
end
if nargin < 3
kk = 50;
end
if nargin < 4
beta = 1;
end

N = size(C,1);
Cn=norm(C,inf);
A=C/Cn;
if issparse(C)
B=speye(N)-A;
else
B=eye(N)-A;
end

clear C A;
traceBt=trace(B);
traceB2t=traceAxB(B,B); % a simple routine to get trace(B 2 ) with N 2 operations
deltaB122=inf;
for iii=1:pp % trace seeds selection for pp times
xxt=randn(N,1);
cct=B*xxt;
xxBNxx=xxt'*cct;
xxTxxt=xxt'*xxt;
traceBe=N*xxBNxx/xxTxxt;
deltaBt=traceBt-traceBe;
cct=B*cct;
xxBNxx=xxt'*cct;
traceB2ee=N*xxBNxx/xxTxxt;
deltaB2e=traceB2t-traceB2ee;
deltaB122new=deltaBt+deltaB2e/2;
if abs(deltaB122new)<=abs(deltaB122), % have found a better seed
xx=xxt;
cc=cct;
xxTxx=xxTxxt; % save the workspace related to the new seed
traceB1e=traceBe;
traceB2e=traceB2ee;
deltaB=deltaBt;
deltaB2=deltaB2e;
deltaB122=deltaB122new;
end
end
clear xxt cct;
rhotr=deltaB2/deltaB;
traceB=traceB1e; traceB2=traceB2e;
if abs(rhotr)>=1, % disp(‘trace rho greater than or equal to 1?');
compens2=deltaB/(1-rhotr/2); % a more robust yet loose trace-seed compensation asin Remark 3
elseif rhotr==0,
compens2=0; % no need compensation as i = 0
else
compens2=-deltaB*log(1-rhotr)/rhotr; % the standard trace-seed compensation as in Remark 3
end;
vv=traceB+traceB2/2;
preadd=traceB2; % save the computation of the first two power-series terms
compens1=0;
wthla1=0;
wthla2=0;
wthla3=0;
for jjj=3:kk, % start the computation from the third power-series term to the kth term
cc=B*cc;
xxBNxx=xx'*cc;
if preadd==0, % only for numerical-robustness consideration
compens1=0; wthl=vv-compens1-compens2;
logdet2=N*log(Cn); logdet=-wthl+logdet2;
return
end
if lambda<1,
sumlambdaik=0; lambdai=1; ndt5=0;
for iii=1:jjj, % computing the second term of the truncation-error compensation as in Proposition 5
lambdai=lambdai*lambda;
ndt5=ndt5+lambdai/(jjj+iii);
sumlambdaik=sumlambdaik+lambdai/iii;
end
if lambdai==0,
compens1=-newadd*ndt5; % truncation-error compensation via (8)
else
% the final truncation-error compensation as in Proposition 5
end
end;
wthl=vv-compens1-compens2; % for the finite sequence with the ensemble effect as in Proposition 6
wthla1=wthla2; wthla2=wthla3; wthla3=wthl;
% save the last three terms of such a sequence, i.e, k-2 , k-1 and k
end;
if lambda>=1,
% disp(‘final lambda greater than or equal to 1?');
compens1=0; % only for numerical-robustness consideration
end
aaa=wthla1;
bbb=wthla2-wthla1;
qqq=(wthla3-wthla2)/bbb;
if bbb==0,
% disp(‘The last three terms of Gamma-sequence are equal: converge well');
else
if abs(qqq)>= 1, % only for numerical-robustness consideration
% disp(‘Gamma-sequence coefficient greater than or equal to 1?');
else
wthl=aaa+bbb*qqq/(1-qqq); % Geometric series based re-estimation as in Proposition 6
end
end
logdet2=N*log(Cn);
logdet=-wthl+logdet2;
if logdet/max(abs([wthl,logdet2]))<1e-2
wthl=wthl*1;
elseif bbb/aaa>1e-4 % use beta to adjust the final ensemble effect if necessary
wthl=wthl*beta; % in general, beta:=1
end
logdet=-wthl+logdet2; % return the final logdet approximation
clear B; % the routine is thus complete
end

function r = traceAxB(A,B)
r = sum(sum(A'.*B,1),2);
end

And some sample code that shows how to use it:

thestd = 0.2;
A = eye(1000);
A = A + randn(1000)*thestd;
A = A'*A;

L = chol(A);
thelogdet = 2*sum(log(diag(L)))
thelogdeta = mcLogDet(A,20,50,1)

In the paper, the authors used (10,30,1) as the parameters of the method by I prefer (20,50,1) for safety. Mind you, the method is not super precise, they say about 10%, but for model comparison it’s more than precise enough.

## One comment

1. Somak says:

I wonder what you mean when you say “… but for model comparison it’s more than precise enough.” Because an absolute difference of 1.92 (half of 95% Chi-square quantile with 1 df) could be significant. That is if (2 x log-likelihood)’s differ by 3.84 then that’s significant assuming the d.f. = 1.
I did an experiment and this stochastic estimate of the log-det was off by 22 from the exact value (found by Cholesky). The relative error was like 0.06% granted but you know that doesn’t help.