It sometimes happens that I need to work with huge matrices in Matlab – too large to fit into RAM (and I have 12GB of RAM on my work computer). A classic example is when doing reverse correlation with long sequences of images. There are a few ways to work with such a large dataset. You might simply have a folder with one file per image, but then your code will mix file access and data processing logic, which is annoying.
A better alternative is memmapfile. This function allows you to transparently access the data in a huge file as though the data was loaded into memory. Internally, however, memmapfile only loads as much data from disk as is requested, so as long as the processing you’re doing requires you to access only part of the data at any one time, you can work comfortably with humongous matrices.
Here’s a concrete example. Suppose that you have a file that represents a very long (30,000 say) sequence of 256×256 images, stored as a stream of doubles. This data takes 15GB to store so it probably won’t fit into memory. You can simulate such a file like so:
h = fopen('memmapex.raw','wb'); for ii = 1:500 fwrite(h,randn(60*256*256,1),'double'); end fclose(h);
Now let’s say I want to average a subset of the images together, as would happen with reverse correlation. Suppose the indexes of the images to average are in a vector called “idx”. This can be done with memmapfile like so:
idx = randperm(30000); idx = sort(idx(1:3000)); rmean = zeros(256,256); m = memmapfile('memmapex.raw','Format',{'double',[256,256],'im'}); for ii = 1:length(idx) rmean = rmean + m.Data(idx(ii)).im; if mod(ii,100) == 0 ii end end rmean = rmean/length(idx);
If you monitor memory usage, you will see that the entire dataset is not loaded into memory. Yet the n’th image can be accessed with m.Data(n).im. memmapfile uses the format specification {‘double’,[256,256],’im’} to translate array indices into byte offsets transparently.
The resulting code is much more readable than one that mixes file access with processing logic. Furthermore, Matlab caches a subset of the accessed data so that subsequent array accesses are much faster. For example, the first time I run the for loop, it takes 85 seconds, while the second time it only takes 1.15 seconds.
6 responses to “Loading huge matrices in Matlab with memmapfile”
[…] an excellent explanation and demo for each function. Highly recommended. Xcorr’s coverage of memmapfile and […]
xcorr, thanks for the quick and dirty howto! great job!
2 comments on your code line: fwrite(h,randn(3000*256*256,1),’float’);
1) in many systems, feeding randn() with such a big input will cause it to run out of memory, so i suggest rewriting the for-loop like:
for ii = 1:100
fwrite(h,randn(256*256*300,1),’float’);
end
or something similar.
2) the type should be ‘double’ in the fwrite() command, otherwise your next code snippet will complain at:
m = memmapfile(‘memmapex.raw’,’Format’,{‘double’,[256,256],’im’});
both entries should be either ‘float’ or ‘double’ to avoid handling different formats.
thanx + cheers,
sleepmode
Indeed, double ~= float. My bad, I’ve changed the code accordingly.
I didn’t know about this limitation, however if you’re going to work with such large files then 64-bit and a boatload of RAM is the way to go. You can buy a 24GB RAM kit these days for 300$:
http://www.newegg.ca/Product/Product.aspx?Item=N82E16820145350&nm_mc=OTC-Wishabi&cm_mmc=OTC-Wishabi-_-Memory+%28Desktop+Memory%29-_-Corsair-_-20145350
Oh for sure, you’re absolutely correct that anyone buying a system today would be crazy not to go 64-bit.
I’m unfortunately stuck with a machine that was great 4 years ago but is showing it’s age. I was hoping to get around my paltry ~3.5GB of RAM by using memory mapped files, but on my system they’re limited to 2GB.
In the end, I think I can get around this by limiting the ‘repeat’ property of the Matlab memmapfile and sliding the ‘offset’ property as needed. It increases the management logic and only works because I need only access the data sequentially, but is a decent alternative.
Thanks again for your post. I’m sure it will be helpful to others.
Hi There,
Thanks for your example, I was looking into memmapfile for the exact same purpose: working with huge datasets.
Unfortunately, I found that A BIG asterisk is needed on this topic: Older versions of Matlab and 32-bit OS’s restrict memory maps to 2GBs :(
You are surely running a 64-bit OS given your 12GB of RAM (drool!), but my Matlab 2009 32-bit documentation says:
“Maximum Size of a Memory Map
Due to limits set by the operating system and MATLAB, the maximum amount of data you can map with a single instance of a memory map is 2 gigabytes on 32-bit systems, and 256 terabytes on 64-bit systems. If you need to map more than this limit, you can either create separate maps for different regions of the file, or you can move the window of one map to different locations in the file.”
That explains why I get a “File to large to memory map” error when trying to run this example.
It seems Matlab 2007 limits both 32 & 64 bit versions to 2GB (from the 2007 documentation):
“Maximum Size of a Memory Map.
…
The 2 GB limit also applies to 64-bit platforms. However, because 64-bit platforms have a much larger address space, they can support having many more map instances in memory at any given time.”
As a result, this feature is quite limited on 32-bit systems or versions older that 2009.
Can you comment on what version of Matlab are you running?