Approximate log determinant of huge matrices

Log determinants frequently need to be computed as part of Bayesian inference in models with Gaussian priors or likelihoods. They can be quite troublesome to compute; done through the Cholesky decomposition, it’s an \mathcal{O}(N^3) operation. It’s pretty much infeasible to compute the exact log det for arbitrary matrices > 10,000 x 10,000. Models with that many parameters frequently occur in non-parametric regression methods like Gaussian processes.

Zhang and Leithead (2007) offer an update to the power series expansion Monte Carlo method of Barry & Pace (1999). The old paper expanded log det(A) into a power series. All the terms in the expansion involve matrix traces; instead of computing the traces explicitly, they are approximated through a Monte Carlo method, which involves taking random matrix vector products. Zhang and Leithead expand upon the method by decreasing the number of iid Gaussian vectors needed to 1, not requiring any restrictions on the eigenvalues of the matrix A, and using extrapolation so the power series can be truncated earlier. They claim their method is \mathcal{O}(50N^2), which is potentially much better than the Cholesky version for large N.

There’s Matlab code for the method in the appendix of the paper. Unfortunately, it’s mangled in the pdf and it’s missing a required function. For convenience, here is a cleaned up version:

function logdet=mcLogDet(C,pp,kk,beta)
% input arguments: covariance matrix C,
% (optional) the number of trace seeds selection pp,
% (optional) the number of power-series terms kk,
% (optional) the ensemble adjusting factor beta.
if nargin < 2
    pp = 20;
end
if nargin < 3
    kk = 50;
end
if nargin < 4
    beta = 1;
end

N = size(C,1);
Cn=norm(C,inf);
A=C/Cn;
if issparse(C)
    B=speye(N)-A;
else
    B=eye(N)-A;
end

clear C A;
traceBt=trace(B);
traceB2t=traceAxB(B,B); % a simple routine to get trace(B 2 ) with N 2 operations
deltaB122=inf;
for iii=1:pp % trace seeds selection for pp times
    xxt=randn(N,1);
    cct=B*xxt;
    xxBNxx=xxt'*cct;
    xxTxxt=xxt'*xxt;
    traceBe=N*xxBNxx/xxTxxt;
    deltaBt=traceBt-traceBe;
    cct=B*cct;
    xxBNxx=xxt'*cct;
    traceB2ee=N*xxBNxx/xxTxxt;
    deltaB2e=traceB2t-traceB2ee;
    deltaB122new=deltaBt+deltaB2e/2;
    if abs(deltaB122new)<=abs(deltaB122), % have found a better seed
        xx=xxt;
        cc=cct;
        xxTxx=xxTxxt; % save the workspace related to the new seed
        traceB1e=traceBe;
        traceB2e=traceB2ee;
        deltaB=deltaBt;
        deltaB2=deltaB2e;
        deltaB122=deltaB122new;
    end
end
clear xxt cct;
rhotr=deltaB2/deltaB;
traceB=traceB1e; traceB2=traceB2e;
if abs(rhotr)>=1, % disp(‘trace rho greater than or equal to 1?');
    compens2=deltaB/(1-rhotr/2); % a more robust yet loose trace-seed compensation asin Remark 3
elseif rhotr==0,
    compens2=0; % no need compensation as i = 0
else
    compens2=-deltaB*log(1-rhotr)/rhotr; % the standard trace-seed compensation as in Remark 3
end;
vv=traceB+traceB2/2;
preadd=traceB2; % save the computation of the first two power-series terms
compens1=0;
wthla1=0;
wthla2=0;
wthla3=0;
for jjj=3:kk, % start the computation from the third power-series term to the kth term
    cc=B*cc;
    xxBNxx=xx'*cc;
    newadd=N*xxBNxx/xxTxx;
    if preadd==0, % only for numerical-robustness consideration
        compens1=0; wthl=vv-compens1-compens2;
        logdet2=N*log(Cn); logdet=-wthl+logdet2;
        return
    end
    lambda=newadd/preadd;
    if lambda<1,
        sumlambdaik=0; lambdai=1; ndt5=0;
        for iii=1:jjj, % computing the second term of the truncation-error compensation as in Proposition 5
            lambdai=lambdai*lambda;
            ndt5=ndt5+lambdai/(jjj+iii);
            sumlambdaik=sumlambdaik+lambdai/iii;
        end
        if lambdai==0,
            compens1=-newadd*ndt5; % truncation-error compensation via (8)
        else
            compens1=newadd*(log(1-lambda)+sumlambdaik)/lambdai;
            % the final truncation-error compensation as in Proposition 5
        end
    end;
    preadd=newadd;
    newadd=newadd/jjj;
    vv=newadd+vv;
    wthl=vv-compens1-compens2; % for the finite sequence with the ensemble effect as in Proposition 6
    wthla1=wthla2; wthla2=wthla3; wthla3=wthl;
    % save the last three terms of such a sequence, i.e, k-2 , k-1 and k
end;
if lambda>=1,
% disp(‘final lambda greater than or equal to 1?');
    compens1=0; % only for numerical-robustness consideration
end
    aaa=wthla1;
    bbb=wthla2-wthla1;
    qqq=(wthla3-wthla2)/bbb;
if bbb==0,
    % disp(‘The last three terms of Gamma-sequence are equal: converge well');
else
    if abs(qqq)>= 1, % only for numerical-robustness consideration
        % disp(‘Gamma-sequence coefficient greater than or equal to 1?');
    else
        wthl=aaa+bbb*qqq/(1-qqq); % Geometric series based re-estimation as in Proposition 6
    end
end
logdet2=N*log(Cn);
logdet=-wthl+logdet2;
if logdet/max(abs([wthl,logdet2]))<1e-2
    wthl=wthl*1;
elseif bbb/aaa>1e-4 % use beta to adjust the final ensemble effect if necessary
    wthl=wthl*beta; % in general, beta:=1
end
logdet=-wthl+logdet2; % return the final logdet approximation
clear B; % the routine is thus complete
end

function r = traceAxB(A,B)
    r = sum(sum(A'.*B,1),2);
end

And some sample code that shows how to use it:

thestd = 0.2;
A = eye(1000);
A = A + randn(1000)*thestd;
A = A'*A;

L = chol(A);
thelogdet = 2*sum(log(diag(L)))
thelogdeta = mcLogDet(A,20,50,1)

In the paper, the authors used (10,30,1) as the parameters of the method by I prefer (20,50,1) for safety. Mind you, the method is not super precise, they say about 10%, but for model comparison it’s more than precise enough.

One response to “Approximate log determinant of huge matrices”

  1. I wonder what you mean when you say “… but for model comparison it’s more than precise enough.” Because an absolute difference of 1.92 (half of 95% Chi-square quantile with 1 df) could be significant. That is if (2 x log-likelihood)’s differ by 3.84 then that’s significant assuming the d.f. = 1.
    I did an experiment and this stochastic estimate of the log-det was off by 22 from the exact value (found by Cholesky). The relative error was like 0.06% granted but you know that doesn’t help.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s