Generalized function composition and pipe construction with the fc package

Susan Wang (xiaofei.wang@yale.edu)

and

Michael Kane (michael.kane@yale.edu)

This talk is about writing functions that construct functions in R

Jan Vitek's Python/R Benchmarks

The Streetlight Effect*: a type of observational bias where people only look for whatever they are searching by looking where it is easiest.

 

 

 

* David H. Freedman (August 1, 2010). "The Streetlight Effect". Discover magazine.

R's syntax values development time over run time.

 

The (forward) pipe operator

> library(magrittr)
> 
> iris %>% head() %>% tail(n=5)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

What's Non-Standard About it?

> library(magrittr)
> 
> iris %>% head() %>% tail(n=5)

Are we really just talking about NSE?

What's going on?

> head %>% tail(n=5)

1 function (x, ...)
2 UseMethod("head")

What's going on?

> class(head %>% tail(n=5))
[1] "noquote"

What's going on?

> str(unclass(head %>% tail(n=5)))
 chr [1:2, 1] "function (x, ...) " "UseMethod(\"head\")"
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "1" "2"
  ..$ : chr ""

> deparse(head)
1] "function (x, ...) "  "UseMethod(\"head\")"

Take 2: Build a magrittr function

> foo <- . %>% head %>% tail(n=5)
> foo
Functional sequence with the following components:

 1. head(.)
 2. tail(., n = 5)

Use 'functions' to extract the individual functions.

Take 2: Build a magrittr function

> unclass(foo)
function (value)
freduce(value, `_function_list`)
<environment: 0x7f9420ac5b20>
> freduce
function (value, function_list)
{
    k <- length(function_list)
    if (k > 1) {
        for (i in 1:(k - 1L)) {
            value <- function_list[[i]](value)
        }
    }
    value <- withVisible(function_list[[k]](value))
    if (value[["visible"]])
        value[["value"]]
    else invisible(value[["value"]])
}
<bytecode: 0x7f94203eba68>
<environment: namespace:magrittr>

Take 2: Build a magrittr function

> ls(environment(foo))
[1] "_fseq"          "_function_list" "freduce"

> (environment(foo))[['_function_list']]
[[1]]
function (.)
head(.)

[[2]]
function (.)
tail(., n = 5)

What about...

> . <- iris
> . %>% head
> . %>% head
Functional sequence with the following components:

 1. head(.)

Use 'functions' to extract the individual functions.

magrittr applied an input to a function, saves the intermediate as
'.' and sends it to the next function.

pipes do 2.5 things

1.    partial function evaluation


2.   function composition

 

2.5 generalized function composition

Back to the example

> foo <- . %>% head %>% tail(n=5)
> 
> # is equivalent to
>
> foo <- function(x) {
+   tail(head(x), n=5)
+ }

Why might someone prefer the latter?

1. We get a regular, readable, stack-traceable R function.


2. It's easier for bytecode interpreter to optimize.

The fc package

An fc function

> fc(tail, x = head(x), n = 5)
function (x)
{
    tail(x = head(x), n = 5)
}

or...

> fc(tail, x = head(y), n = 5)
function (y)
{
    tail(x = head(y), n = 5)
}

codetools

but be careful...

> fc(tail, y = head(x), n = 5)
function (x)
{
    tail(x, y = head(x), n = 5)
}

Implementing %>%

Infix operators like %>% are read left to right.

 

Composition is performed right to left.

 

We can't implement.

 

 

You need to write

> iris %>% head() %>% tail(n=5)
> ( head() %>% fc(tail, n=5) )(iris)

Implementing %>%

...or

> foo <- head() %>% fc(tail, n=5) 
> foo(iris)

A note on anonymous functions

> fc(head, x = fc(head, n=1)(x))

Benchmarks

> library(microbenchmark)
>
>
> log_sqrt_f <- function(x) log(x=sqrt(x))
> log_sqrt_compose <- purrr::compose(log, sqrt)
> `%>%` <- magrittr::`%>%`
> log_sqrt_pipe <- . %>% sqrt %>% log
> log_sqrt_fc <- fc(log, x=sqrt(x))
>
> microbenchmark::microbenchmark(log_sqrt_f(10),
+                                log_sqrt_compose(10), 
+                                log_sqrt_pipe(10),
+                                log_sqrt_fc(10), times = 10000)
Unit: nanoseconds
                 expr  min     lq      mean median   uq     max neval
       log_sqrt_f(10)  394  495.0  882.1610    651  722 1404194 10000
 log_sqrt_compose(10) 3199 3691.0 4923.6965   3961 4620 2716603 10000
    log_sqrt_pipe(10) 2906 3441.5 4556.6262   3761 4181 1821010 10000
      log_sqrt_fc(10)  389  493.0  840.0981    654  724 1044451 10000

fc as a middle layer for magrittr?

Thanks

Cleveland R User Talk

By Michael Kane

Cleveland R User Talk

  • 1,474