Generalized function composition and pipe construction with the fc package
Susan Wang (xiaofei.wang@yale.edu)
and
Michael Kane (michael.kane@yale.edu)
This talk is about writing functions that construct functions in R
*Shamelessly taken from https://dl.dropboxusercontent.com/u/20315677/r-meetup-2016-slides.pdf
Jan Vitek's Python/R Benchmarks
The Streetlight Effect*: a type of observational bias where people only look for whatever they are searching by looking where it is easiest.
* David H. Freedman (August 1, 2010). "The Streetlight Effect". Discover magazine.
R's syntax values development time over run time.
The (forward) pipe operator
> library(magrittr)
>
> iris %>% head() %>% tail(n=5)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
What's Non-Standard About it?
> library(magrittr)
>
> iris %>% head() %>% tail(n=5)
Are we really just talking about NSE?
What's going on?
> head %>% tail(n=5)
1 function (x, ...)
2 UseMethod("head")
What's going on?
> class(head %>% tail(n=5))
[1] "noquote"
What's going on?
> str(unclass(head %>% tail(n=5)))
chr [1:2, 1] "function (x, ...) " "UseMethod(\"head\")"
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "1" "2"
..$ : chr ""
> deparse(head)
1] "function (x, ...) " "UseMethod(\"head\")"
Take 2: Build a magrittr function
> foo <- . %>% head %>% tail(n=5)
> foo
Functional sequence with the following components:
1. head(.)
2. tail(., n = 5)
Use 'functions' to extract the individual functions.
Take 2: Build a magrittr function
> unclass(foo)
function (value)
freduce(value, `_function_list`)
<environment: 0x7f9420ac5b20>
> freduce
function (value, function_list)
{
k <- length(function_list)
if (k > 1) {
for (i in 1:(k - 1L)) {
value <- function_list[[i]](value)
}
}
value <- withVisible(function_list[[k]](value))
if (value[["visible"]])
value[["value"]]
else invisible(value[["value"]])
}
<bytecode: 0x7f94203eba68>
<environment: namespace:magrittr>
Take 2: Build a magrittr function
> ls(environment(foo))
[1] "_fseq" "_function_list" "freduce"
> (environment(foo))[['_function_list']]
[[1]]
function (.)
head(.)
[[2]]
function (.)
tail(., n = 5)
What about...
> . <- iris
> . %>% head
> . %>% head
Functional sequence with the following components:
1. head(.)
Use 'functions' to extract the individual functions.
magrittr applied an input to a function, saves the intermediate as
'.' and sends it to the next function.
pipes do 2.5 things
1. partial function evaluation
2. function composition
2.5 generalized function composition
Back to the example
> foo <- . %>% head %>% tail(n=5)
>
> # is equivalent to
>
> foo <- function(x) {
+ tail(head(x), n=5)
+ }
Why might someone prefer the latter?
1. We get a regular, readable, stack-traceable R function.
2. It's easier for bytecode interpreter to optimize.
The fc package
An fc function
> fc(tail, x = head(x), n = 5)
function (x)
{
tail(x = head(x), n = 5)
}
or...
> fc(tail, x = head(y), n = 5)
function (y)
{
tail(x = head(y), n = 5)
}
codetools
but be careful...
> fc(tail, y = head(x), n = 5)
function (x)
{
tail(x, y = head(x), n = 5)
}
Implementing %>%
Infix operators like %>% are read left to right.
We can't implement.
You need to write
> iris %>% head() %>% tail(n=5)
> ( head() %>% fc(tail, n=5) )(iris)
Implementing %>%
...or
> foo <- head() %>% fc(tail, n=5)
> foo(iris)
A note on anonymous functions
> fc(head, x = fc(head, n=1)(x))
Benchmarks
> library(microbenchmark)
>
>
> log_sqrt_f <- function(x) log(x=sqrt(x))
> log_sqrt_compose <- purrr::compose(log, sqrt)
> `%>%` <- magrittr::`%>%`
> log_sqrt_pipe <- . %>% sqrt %>% log
> log_sqrt_fc <- fc(log, x=sqrt(x))
>
> microbenchmark::microbenchmark(log_sqrt_f(10),
+ log_sqrt_compose(10),
+ log_sqrt_pipe(10),
+ log_sqrt_fc(10), times = 10000)
Unit: nanoseconds
expr min lq mean median uq max neval
log_sqrt_f(10) 394 495.0 882.1610 651 722 1404194 10000
log_sqrt_compose(10) 3199 3691.0 4923.6965 3961 4620 2716603 10000
log_sqrt_pipe(10) 2906 3441.5 4556.6262 3761 4181 1821010 10000
log_sqrt_fc(10) 389 493.0 840.0981 654 724 1044451 10000
fc as a middle layer for magrittr ?
Thanks
Cleveland R User Talk
By Michael Kane
Cleveland R User Talk
- 1,621