Distributed Data Structures in R for General, Large-Scale Computing
Michael J. Kane
Phronesis, LLC and Yale University
Acknowledgements
Simon Urbanek and AT&T Research Labs
A portion of this research is based on research sponsored by DARPA under award FA8750-12-2-
0324. The U.S. Government is authorized to reproduce and distribute reprints for Governmental
purposes notwithstanding any copyright notation thereon.
Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted
as necessarily representing the official policies or endorsements, either expressed or implied, of
DARPA or the U.S. Government.
Current Prototypical Communication For Distributed Computing
Redis
Communication
Orthogonality
- Workers block on queues (lists)
- Consume computations
- Return results on another queue
Current Prototypical Communication For Distributed Computing
Redis
Performance
Orthogonality
- Workers block on queues (lists)
- Return results on another queue
MPI
- Data movement is defined up-front
- Processes communicate directly
Communication
How does this work?
How does this work?
How does this work?
How does this work?
How does this work?
How is it used?
- pull( <resource>, <expression> ) - executes an expression at a resource location and returns the result
- push( <resource>, <expression> ) - executes an expression at a resource location and returns the name of a new resource
C <- push( "B", "B %*% pull('A', 'A')" )
A Generative Communication Framework (cf Gelernter 1985)
Bottom line is that you can support a much larger class of distributed communication patterns.
What's the status?
- Communication framework is almost complete and fully asynchronous
- Currently building data structures on top of the framework
- distributed.vector
- distributed.data.frame
- Working on distributed matrices (mixed sparse and dense ) this summer
- We have a persistence model
-
We have a persistence model
Further Information:
In R type:
?parallel:::mcparallel
-
Next week at the NYC Data Science Meetup.
- Email me with questions at michael dot kane at yale dot edu.