Advanced R Exercise Solution (5)

Xinchen Pan · 2018/06/13

Debugging, condition handling, and defensive programming

Condition handling

  1. Compare the following two implementations of message2error(). What is the main advantage of withCallingHandlers() in this scenario? (Hint: look carefully at the traceback.)
message2error <- function(code) {
  withCallingHandlers(code, message = function(e) stop(e))
}
message2error1 <- function(code) {
  tryCatch(code, message = function(e) stop(e))
}

Defensive programming

  1. The goal of the col_means() function defined below is to compute the means of all numeric columns in a data frame.
col_means <- function(df) {
  numeric <- sapply(df, is.numeric)
  numeric_cols <- df[, numeric]

  data.frame(lapply(numeric_cols, mean))
}

However, the function is not robust to unusual inputs. Look at the following results, decide which ones are incorrect, and modify col_means() to be more robust. (Hint: there are two function calls in col_means() that are particularly prone to problems.)

#correct
col_means(mtcars)
##        mpg    cyl     disp       hp     drat      wt     qsec     vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
##        am   gear   carb
## 1 0.40625 3.6875 2.8125
col_means(mtcars[, 0])
#Error in .subset(x, j) : invalid subscript type 'list'
mtcars[0, ]
col_means(1:10)
#Error in df[, numeric] : incorrect number of dimensions
#This calculates the row mean
col_means(mtcars[, "mpg", drop = F])
##   X21 X21.1 X22.8 X21.4 X18.7 X18.1 X14.3 X24.4 X22.8.1 X19.2 X17.8 X16.4
## 1  21    21  22.8  21.4  18.7  18.1  14.3  24.4    22.8  19.2  17.8  16.4
##   X17.3 X15.2 X10.4 X10.4.1 X14.7 X32.4 X30.4 X33.9 X21.5 X15.5 X15.2.1
## 1  17.3  15.2  10.4    10.4  14.7  32.4  30.4  33.9  21.5  15.5    15.2
##   X13.3 X19.2.1 X27.3 X26 X30.4.1 X15.8 X19.7 X15 X21.4.1
## 1  13.3    19.2  27.3  26    30.4  15.8  19.7  15    21.4
col_means(as.matrix(mtcars))
#Error in df[, numeric] : (subscript) logical subscript too long
col_means(as.list(mtcars))
#Error in df[, numeric] : incorrect number of dimensions
#
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
##   X21 X21.1 X22.8 X21.4 X18.7 X18.1 X14.3 X24.4 X22.8.1 X19.2 X17.8 X16.4
## 1  21    21  22.8  21.4  18.7  18.1  14.3  24.4    22.8  19.2  17.8  16.4
##   X17.3 X15.2 X10.4 X10.4.1 X14.7 X32.4 X30.4 X33.9 X21.5 X15.5 X15.2.1
## 1  17.3  15.2  10.4    10.4  14.7  32.4  30.4  33.9  21.5  15.5    15.2
##   X13.3 X19.2.1 X27.3 X26 X30.4.1 X15.8 X19.7 X15 X21.4.1
## 1  13.3    19.2  27.3  26    30.4  15.8  19.7  15    21.4

Revised col_means

col_means2 <- function(df) {
  df <- data.frame(df)
  numeric <- vapply(df, is.numeric, logical(1))
  numeric_cols <- df[, numeric, drop = FALSE]

  data.frame(lapply(numeric_cols, mean))
}
col_means2(mtcars)
##        mpg    cyl     disp       hp     drat      wt     qsec     vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
##        am   gear   carb
## 1 0.40625 3.6875 2.8125
col_means2(mtcars[, 0])
## data frame with 0 columns and 0 rows
col_means2(mtcars[0, ])
##   mpg cyl disp  hp drat  wt qsec  vs  am gear carb
## 1 NaN NaN  NaN NaN  NaN NaN  NaN NaN NaN  NaN  NaN
col_means2(mtcars[, "mpg", drop = F])
##        mpg
## 1 20.09062
col_means2(1:10)
##    df
## 1 5.5
col_means2(as.matrix(mtcars))
##        mpg    cyl     disp       hp     drat      wt     qsec     vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
##        am   gear   carb
## 1 0.40625 3.6875 2.8125
col_means2(as.list(mtcars))
##        mpg    cyl     disp       hp     drat      wt     qsec     vs
## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375
##        am   gear   carb
## 1 0.40625 3.6875 2.8125
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means2(mtcars2)
##        mpg
## 1 20.09062
  1. The following function “lags” a vector, returning a version of x that is n values behind the original. Improve the function so that it (1) returns a useful error message if n is not a vector, and (2) has reasonable behaviour when n is 0 or longer than x.
lag <- function(x, n = 1L) {
  if (!is.vector(x)){
    stop("x needs to be a vector ")
  } else{
  xlen <- length(x)
  }
  if (n > xlen){
    c(rep(NA, xlen))
  } else{
    c(rep(NA, n), x[seq_len(xlen - n)])
  }
  
}
lag(1:10, 1)
##  [1] NA  1  2  3  4  5  6  7  8  9
lag(1:10, 100)
##  [1] NA NA NA NA NA NA NA NA NA NA