Grammars and Structures for Computing with Data

IEEE Top Programming Languages*

What about R works when computing with data?

 

It's not the performance benchmarks.

Jan Vitek's Python/R Benchmarks

The Streetlight Effect*: a type of observational bias where people only look for whatever they are searching by looking where it is easiest.

 

 

 

* David H. Freedman (August 1, 2010). "The Streetlight Effect". Discover magazine.

Your data are probably not that big.

horizontal scalability beats vertical scalability

and when they are...

Because you're a C++ programmer, there's an above-average chance you're a performance freak. If you're not you're still probably sympathetic to their point of view. (If you're not at all interested in performance, shouldn't you be in the [interpreted language] room down the hall?)"
-- Scott Meyers, Effective Modern C++

R's syntax values development time over run time.

 

R makes it easy to create dialects. Some of them are even useful.

A dialect of a programming language or a data exchange language is a (relatively small) variation or extension of the language that does not change its intrinsic nature. - Wikipedia

create_summary(analyze(clean(load(file_name))))
library(magrittr)

file_name %>% load %>% clean %>% 
  analyze %>% create_summary

Borrowing "Pipes"

R values user time  over developer time compared to other languages*

*Paraphrasing Jiahao Chen

> citation("bigmemory")

To cite bigmemory in publications use:

  Michael J. Kane, John Emerson, Stephen Weston (2013). Scalable
  Strategies for Computing with Massive Data. Journal of Statistical
  Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.

A BibTeX entry for LaTeX users is

  @Article{,
    title = {Scalable Strategies for Computing with Massive Data},
    author = {Michael J. Kane and John Emerson and Stephen Weston},
    journal = {Journal of Statistical Software},
    year = {2013},
    volume = {55},
    number = {14},
    pages = {1--19},
    url = {http://www.jstatsoft.org/v55/i14/},
  }

Keep Data Science Weird

  • Simon Urbanek

  • Ryan Hafen

  • Jiahao Chen

  • Jan Vitek

  • Jake VanderPlas

  • Andy Terrell 

  • Peter Wang

  • Travis Oliphant

  • Duncan Temple-Lang

Acknowledgements

JSM 2016

By Michael Kane

JSM 2016

  • 3,621