Michael Kane
Yale University and Phronesis LLC
"The goal of this workshop is to standardize the API for exposing distributed computing in R, learn from the experiences of attendees in using R for large scale analysis, and collaborate in open source."
1. Compute cycles are cheaper than brain cycles.
"Because you're a C++ programmer, there's an above-average chance you're a performance freak. If you're not you're still probably sympathetic to their point of view. (If you're not at all interested in performance, shouldn't you be in the Python room down the hall?)"
-- Scott Meyers, Effective Modern C++
Performance freaks value task
execution time over their free time.
Horizontal scalability beats vertical scalability when compute is a commodity
When performing an analysis, brain cycles are better spent on analysis, not implementation.
library(foreach) library(iterators) library(doMC) registerDoMC(cores = 1024) ans = foreach(it=make_data_gen()) %dopar% { process_data(it) }
Get a bigger machine
Talk to Bryan Lewis
"The greatest value of a picture is when it forces us to notice what we never expected to see." --John Tukey
Manage complexity/abundance with interactivity
Organize data by:
"It seems natural to call such computer guided diagnostics cognostics. We must learn to choose them, calculate them, and use them. Else we drown in a sea of many displays" -John Tukey
Figure from Explaining and Harnessing Adversarial Examples by Goodfellow et al.
These images are classified with >99.6% confidence as the shown class by a Convolutional Network.