Michael J. Kane
Yale University and Phronesis LLC
A package for creating, storing, accessing, and manipulate dense (and semi-dense) matrices that are larger than available RAM.
It's been around since 2008 - I wrote it with Jay Emerson
Part of a suite of packages for processing matrices out-of-core (biganalytics, bigtabulate, bigalgebra, synchronicity)
Currently being maintained by myself and Pete Haverty
> library(bigmemory)
> x = big.matrix(3, 3, type='integer', init=123,
+ backingfile="example.bin",
+ descriptorfile="example.desc",
+ dimnames=list(c('a','b','c'),
+ c('d', 'e', 'f')))
> x[,]
d e f
a 123 123 123
b 123 123 123
c 123 123 123
> rm(x)
> y = attach.big.matrix("example.desc")
> y[,]
d e f
a 123 123 123
b 123 123 123
c 123 123 123
mmap - a POSIX-compliant Unix system call that maps files or devices into memory
All data movement (disk to RAM to cache) is handled transparently by the operating system.
The binary representation of the matrix is stored directly on disk.
The descriptor file holds meta-information (number of row, number of columns, etc.).
Works with any filesystem supporting mmap (including distributed ones).
Reverse depends: bigalgebra, biganalytics, bigpca, bigrf,
bigtabulate
Reverse imports: Rdsm
Reverse linking to: bigalgebra, biganalytics, bigrf, bigtabulate
Reverse suggests: bio3d, matpow, mlDNA, nat.nblast, NMF, PopGenome,
rsgcc
Reverse enhances: bigmemory.sri
Reverse depends: bigmemoryExtras, ChipXpressData, Biobase and BiocGenerics
(through bigmemoryExtras)
CRAN
Bioconductor
Only import data once
It's generally faster than swapping
It's compatible with BLAS and LAPACK libraries
Data structures (the binary representation) could be stored persistently and would not need to be explicitly imported
bigmemory's (and ff's) users show that there is a demand for memory mapped objects
We've also showed that they can be performant
How can they be better integrated?
More importantly should they be better integrated?