Advanced R Exercise Solution (6)

Functional Programming

Anonymous functions

Given a function, like “mean”, match.fun() lets you find a function. Given a function, can you find its name? Why doesn’t that make sense in R?

R doesn’t have a special syntax for creating a named function: when you create a function, you use the regular assignment operator to give it a name.

Use lapply() and an anonymous function to find the coefficient of variation (the standard deviation divided by the mean) for all columns in the mtcars dataset.

lapply(mtcars, function(x) sd(x) / mean(x))

## $mpg
## [1] 0.2999881
## 
## $cyl
## [1] 0.2886338
## 
## $disp
## [1] 0.5371779
## 
## $hp
## [1] 0.4674077
## 
## $drat
## [1] 0.1486638
## 
## $wt
## [1] 0.3041285
## 
## $qsec
## [1] 0.1001159
## 
## $vs
## [1] 1.152037
## 
## $am
## [1] 1.228285
## 
## $gear
## [1] 0.2000825
## 
## $carb
## [1] 0.5742933

Use integrate() and an anonymous function to find the area under the curve for the following functions. Use Wolfram Alpha to check your answers.

y = x ^ 2 - x, x in [0, 10]

integrate(function(x) x**2 - x, 0, 10)

## 283.3333 with absolute error < 3.1e-12

y = sin(x) + cos(x), x in [-??, ??]

integrate(function(x) sin(x) + cos(x), -pi, pi)

## 2.615901e-16 with absolute error < 6.3e-14

y = exp(x) / x, x in [10, 20]

integrate(function(x) exp(x) / x, 10, 20)

## 25613160 with absolute error < 2.8e-07

A good rule of thumb is that an anonymous function should fit on one line and shouldn’t need to use {}. Review your code. Where could you have used an anonymous function instead of a named function? Where should you have used a named function instead of an anonymous function?

Closures

Why are functions created by other functions called closures?

Closures get their name because they enclose the environment of the parent function and can access all its variables.

What does the following statistical function do? What would be a better name for it? (The existing name is a bit of a hint.)

bc <- function(lambda) {
  if (lambda == 0) {
    function(x) log(x)
  } else {
    function(x) (x ^ lambda - 1) / lambda
  }
}

It does the Box-Cox transformation.

What does approxfun() do? What does it return?

Return a list of points which linearly interpolate given data points, or a function performing the linear (or constant) interpolation.

What does ecdf() do? What does it return?

Compute an empirical cumulative distribution function, with several methods for plotting, printing and computing with such an “ecdf” object.

Create a function that creates functions that compute the ith central moment of a numeric vector. You can test it by running the following code:

moment <- function(k){
  function(x)
    mean((x - mean(x)) ** k)
}
m1 <- moment(1)
m2 <- moment(2)
x <- runif(100)
stopifnot(all.equal(m1(x), 0))
stopifnot(all.equal(m2(x), var(x) * 99 / 100))

Create a function pick() that takes an index, i, as an argument and returns a function with an argument x that subsets x with i.

pick <- function(i) {
  function(x) 
    x[i]
}

all.equal(lapply(mtcars, pick(5)), lapply(mtcars, function(x) x[[5]]))

## [1] TRUE

Lists of functions

Implement a summary function that works like base::summary(), but uses a list of functions. Modify the function so it returns a closure, making it possible to use it as a function factory.

summary_simple <- list(
  Min = function(x) min(x),
  First_Qu = function(x) quantile(x, names = FALSE)[2],
  Median = function(x) median(x),
  Mean = function(x) mean(x),
  Third_Qu = function(x) quantile(x, names = FALSE)[4],
  Max = function(x) max(x)
)

lapply(summary_simple, function(f) f(mtcars$mpg))

## $Min
## [1] 10.4
## 
## $First_Qu
## [1] 15.425
## 
## $Median
## [1] 19.2
## 
## $Mean
## [1] 20.09062
## 
## $Third_Qu
## [1] 22.8
## 
## $Max
## [1] 33.9

Which of the following commands is equivalent to with(x, f(z))

x $f (x$ z).
f(x$z).
x$f(z).
f(z).
It depends.

Answer: b

Case study: numerical integration

Instead of creating individual functions (e.g., midpoint(), trapezoid(), simpson(), etc.), we could store them in a list. If we did that, how would that change the code? Can you create the list of functions from a list of coefficients for the Newton-Cotes formulae?

combo <- function(f) {
  list(
    midpoint <- function(a, b) {
    (b - a) * f((a + b) / 2)
  },
  
   trapezoid <- function(a, b) {
    (b - a) / 2 * (f(a) + f(b))
   },
   
   simpson <- function(a, b) {
  (b - a) / 6 * (f(a) + 4 * f((a + b) / 2) + f(b))
}
)

}

kk <- combo(sin)
lapply(kk, function(f) f(0, pi))

## [[1]]
## [1] 3.141593
## 
## [[2]]
## [1] 1.923607e-16
## 
## [[3]]
## [1] 2.094395

The trade-off between integration rules is that more complex rules are slower to compute, but need fewer pieces. For sin() in the range [0, π], determine the number of pieces needed so that each rule will be equally accurate. Illustrate your results with a graph. How do they change for different functions? sin(1 / x^2) is particularly challenging.

R · Data Scince · sample(thoughts, ?)