Distributed Data Structures in R for General, Large-Scale Computing
Michael J. Kane
Phronesis, LLC and Yale UniversityAcknowledgements
Simon Urbanek and AT&T Research Labs
A portion of this research is based on research sponsored by DARPA under award FA8750-12-2- 0324. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.
Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.
Current Prototypical Communication For Distributed Computing
Redis
Communication
Orthogonality
- Workers block on queues (lists)
- Consume computations
- Return results on another queue
Current Prototypical Communication For Distributed Computing
Redis
Performance
Orthogonality
- Workers block on queues (lists)
- Return results on another queue
MPI
- Data movement is defined up-front
- Processes communicate directly
Communication
How does this work?
How does this work?
How does this work?
How does this work?
How does this work?
How is it used?
- pull( <resource>, <expression> ) - executes an expression at a resource location and returns the result
- push( <resource>, <expression> ) - executes an expression at a resource location and returns the name of a new resource
C <- push( "B", "B %*% pull('A', 'A')" )
A Generative Communication Framework (cf Gelernter 1985)
Can be though of as a "functional" version of tuplespaces
- space uncoupling
- time uncoupling
- distributed sharing
- support for continuation
Bottom line is that you can support a much larger class of distributed communication patterns.
What's the status?
- Currently alpha
- Communication framework is almost complete and fully asynchronous
- Currently building data structures on top of the framework
- distributed.vector
- distributed.data.frame
- Working on distributed matrices (mixed sparse and dense ) this summer
- We have a persistence model
-
We have a persistence model
Further Information:
?parallel:::mcparallel
- Simon's background http://www.rforge.net/background/.
-
The cnidaria package https://github.com/kaneplusplus/cnidaria.
- NIPS 2013 paper http://biglearn.org/index.php/Papers.
-
Next week at the NYC Data Science Meetup.
- Email me with questions at michael dot kane at yale dot edu.
RFinance 2014
By Michael Kane
RFinance 2014
My 5 minute talk for RFinance
- 2,246