Grammars and Structures for Computing with Data
IEEE Top Programming Languages*
What about R works when computing with data?
It's not the performance benchmarks.
*Shamelessly taken from https://dl.dropboxusercontent.com/u/20315677/r-meetup-2016-slides.pdf
Jan Vitek's Python/R Benchmarks
The Streetlight Effect*: a type of observational bias where people only look for whatever they are searching by looking where it is easiest.
* David H. Freedman (August 1, 2010). "The Streetlight Effect". Discover magazine.
Your data are probably not that big.
horizontal scalability beats vertical scalability
and when they are...
Because you're a C++ programmer, there's an above-average chance you're a performance freak. If you're not you're still probably sympathetic to their point of view. (If you're not at all interested in performance, shouldn't you be in the [interpreted language] room down the hall?)"
-- Scott Meyers, Effective Modern C++
R's syntax values development time over run time.
R makes it easy to create dialects. Some of them are even useful.
A dialect of a programming language or a data exchange language is a (relatively small) variation or extension of the language that does not change its intrinsic nature. - Wikipedia
create_summary(analyze(clean(load(file_name))))
library(magrittr)
file_name %>% load %>% clean %>%
analyze %>% create_summary
Borrowing "Pipes"
R values user time over developer time compared to other languages*
*Paraphrasing Jiahao Chen
> citation("bigmemory")
To cite bigmemory in publications use:
Michael J. Kane, John Emerson, Stephen Weston (2013). Scalable
Strategies for Computing with Massive Data. Journal of Statistical
Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
A BibTeX entry for LaTeX users is
@Article{,
title = {Scalable Strategies for Computing with Massive Data},
author = {Michael J. Kane and John Emerson and Stephen Weston},
journal = {Journal of Statistical Software},
year = {2013},
volume = {55},
number = {14},
pages = {1--19},
url = {http://www.jstatsoft.org/v55/i14/},
}
Keep Data Science Weird
-
Simon Urbanek
-
Ryan Hafen
-
Jiahao Chen
-
Jan Vitek
-
Jake VanderPlas
-
Andy Terrell
-
Peter Wang
-
Travis Oliphant
-
Duncan Temple-Lang
Acknowledgements
JSM 2016
By Michael Kane
JSM 2016
- 3,763