Advanced R Exercise Solution (1)

I am reading the advanced R written by Hadley Wickham recently. I want also do the exercise this time. Not sure if I can finish all of them. Let’s see.

Data Structures

Vectors

What are the six types of atomic vector? How does a list differ from an atomic vector?

Answer:

logical, integer, double, character, complex and raw.

sapply(list(3L, 3, TRUE, "Time", 3i, raw(2)), typeof)

## [1] "integer"   "double"    "logical"   "character" "complex"   "raw"

Lists are different from atomic vectors because their elements can be of any type, including lists.

What makes is.vector() and is.numeric() fundamentally different to is.list() and is.character()?

Answer:

is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. is.numeric() is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.

Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors.

Test your knowledge of vector coercion rules by predicting the output of the following uses of c():

Answer:

c(1, FALSE) == c(1, 0)

## [1] TRUE TRUE

c("a", 1) == c("a", "1")

## [1] TRUE TRUE

all.equal(c(list(1), "a"), c(list(1), list("a")))

## [1] TRUE

c(TRUE, 1L) == c(1, 1)

## [1] TRUE TRUE

Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?

Answer:

The elements in lists can be heterogeneous. So in order to convert a list to an atomic vector, it has to be unlisted first before the values can be coerced.

Why is 1 == “1” true? Why is -1 < FALSE true? Why is “one” < 2 false?

Answer:

“one” cannot be recognized as 1 in R. The other two are because of the coercion rules.

Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)

Answer:

Since NA is a logical constant of length 1 which contains a missing value indicator. If we saw a NA, it means that is.na(x) already equals to TRUE.

NA will always be coerced to the correct type if used inside c() because logical vector is in the lowest hierarchy of the coercion rule.

Attributes

An early draft used this code to illustrate structure():

structure(1:5, comment = "my attribute")

## [1] 1 2 3 4 5

Answer:

From the help page,we see that

comment {base}  R Documentation
Query or Set a "comment" Attribute

Description

These functions set and query a comment attribute for any R objects. This is typically 
useful for data.frames or model fits.

Contrary to other attributes, the comment is not printed (by print or print.default).

Assigning NULL or a zero-length character vector removes the comment.

What happens to a factor when you modify its levels?

Answer:

f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
f1

##  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

The order of the factors itself will also change.

What does this code do? How do f2 and f3 differ from f1?

Answer:

f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))
f2

##  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

f3

##  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

The first line of code reversed the order of the factors but does not change the levels. The second line changes the order of the levels but do not reverse the order of the factors.

Matrices and arrays

What does dim() return when applied to a vector?

Answer:

dim(1:4)

## NULL

If is.matrix(x) is TRUE, what will is.array(x) return?

Answer:

is.matrix(matrix(1:9, nrow = 3))

## [1] TRUE

is.array(matrix(1:9, nrow = 3))

## [1] TRUE

How would you describe the following three objects? What makes them different to 1:5?

Answer:

x1 <- array(1:5, c(1, 1, 5))
x2 <- array(1:5, c(1, 5, 1))
x3 <- array(1:5, c(5, 1, 1))

x1 is to make five \(1\times 1\) matrix. x2 is to make one \(1\times 5\) matrix x3 is to make one \(5\times 1\) matrix

dimension?

dim(x1)

## [1] 1 1 5

dim(1:5)

## NULL

Data Frame

What attributes does a data frame possess?

Answer:

We can use attributes() to find out the attributes of data frame.

names(attributes(data.frame(x = 1:3, y = c("a", "b", "c"))))

## [1] "names"     "class"     "row.names"

What does as.matrix() do when applied to a data frame with columns of different types?

Answer:

The numeric values will be coerced to characters.

str(as.matrix(data.frame(x = 1:3, y = c("a", "b", "c"))))

##  chr [1:3, 1:2] "1" "2" "3" "a" "b" "c"
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "x" "y"

If the data frame contain both logical values and numeric value, the logical values will be coerced to numeric values.

as.matrix(data.frame(x = c(T, T, F), y = c(1, 2, 3)))

##      x y
## [1,] 1 1
## [2,] 1 2
## [3,] 0 3

Can you have a data frame with 0 rows? What about 0 columns?

Answer:

Yes.

data.frame(x = character(), y = numeric())

## [1] x y
## <0 rows> (or 0-length row.names)

Subsetting

Data Types

Fix each of the following common data frame subsetting errors:

Answer:

mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]

##Corrected 
mtcars[mtcars$cyl == 4, ]
mtcars[1:4, ]
mtcars[mtcars$cyl <=5, ]
mtcars[mtcars$cyl %in% c(4, 6), ]

Why does x <- 1:5; x[NA] yield five missing values? (Hint: why is it different from x[NA_real_]?)

Answer:

x <- 1:5
x[NA]

## [1] NA NA NA NA NA

Since NA is a logical value so it looks for every element in the vector. NA_real_ we can consider it as just like one value. ??

What does upper.tri() return? How does subsetting a matrix with it work? Do we need any additional subsetting rules to describe its behaviour?

Answer:

It returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower or upper triangle.

x <- outer(1:5, 1:5, FUN = "*")
upper.tri(x)

##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] FALSE  TRUE  TRUE  TRUE  TRUE
## [2,] FALSE FALSE  TRUE  TRUE  TRUE
## [3,] FALSE FALSE FALSE  TRUE  TRUE
## [4,] FALSE FALSE FALSE FALSE  TRUE
## [5,] FALSE FALSE FALSE FALSE FALSE

x[upper.tri(x)]

##  [1]  2  3  6  4  8 12  5 10 15 20

We can not keep preserve the data structures of the matrix if we are not subsetting a complete row or column.

Why does mtcars[1:20] return an error? How does it differ from the similar mtcars[1:20, ]?

Answer:

Since there are only 11 columns in mtcars. mtcars[1:20] subsets on columns. mtcars[1:20, ] subsets on rows.

Implement your own function that extracts the diagonal entries from a matrix (it should behave like diag(x) where x is a matrix).

Answer:

diagnoal <- function(matrix){
  dims <- dim(matrix)
  i = 1
  result <- c()
  while(i**2 <= dims[1]**2){
    result <- c(result, matrix[i, i])
    i = i + 1
  }
  result
}

diagnoal(matrix(letters[1:16],nrow=4))

## [1] "a" "f" "k" "p"

diag(matrix(letters[1:16],nrow=4))

## [1] "a" "f" "k" "p"

What does df[is.na(df)] <- 0 do? How does it work?

Answer:

It replaces all NA with 0.

df <- data.frame("a" = c(1,2, NA, NA), "a" = c(NA, NA, 1,2))
is.na(df)

##          a   a.1
## [1,] FALSE  TRUE
## [2,] FALSE  TRUE
## [3,]  TRUE FALSE
## [4,]  TRUE FALSE

df[is.na(df)]

## [1] NA NA NA NA

df[is.na(df)] <- 0
df

##   a a.1
## 1 1   0
## 2 2   0
## 3 0   1
## 4 0   2

Subsetting operators

Given a linear model, e.g., mod <- lm(mpg ~ wt, data = mtcars), extract the residual degrees of freedom. Extract the R squared from the model summary (summary(mod))

Answer:

mod <- lm(mpg ~ wt, data = mtcars)
mod$df.residual

## [1] 30

summary(mod)$r.squared

## [1] 0.7528328

Subsetting and assignment

How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?

Answer:

dt <- data.frame("X1" = c(rep(1, 5)), "X2" = c(rep(2, 5)), "X3" = c(rep(3, 5)))
dt

##   X1 X2 X3
## 1  1  2  3
## 2  1  2  3
## 3  1  2  3
## 4  1  2  3
## 5  1  2  3

dt[ ,sample(ncol(dt))]

##   X1 X3 X2
## 1  1  3  2
## 2  1  3  2
## 3  1  3  2
## 4  1  3  2
## 5  1  3  2

How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

Answer:

dt <- data.frame("X1" = 1:10, "X2" = 1:10, "X3" = 1:10)
dt

##    X1 X2 X3
## 1   1  1  1
## 2   2  2  2
## 3   3  3  3
## 4   4  4  4
## 5   5  5  5
## 6   6  6  6
## 7   7  7  7
## 8   8  8  8
## 9   9  9  9
## 10 10 10 10

sample_contiguous <- function(m){
  samples <- sample(nrow(dt) - m + 1, 1)
  dt[samples :(samples + m - 1), ]
}

sample_contiguous(7)

##    X1 X2 X3
## 4   4  4  4
## 5   5  5  5
## 6   6  6  6
## 7   7  7  7
## 8   8  8  8
## 9   9  9  9
## 10 10 10 10

How could you put the columns in a data frame in alphabetical order?

Answer:

dt <- data.frame("dog" = 1:5, "cat" = 1:5, "apple" = 1:5)
dt

##   dog cat apple
## 1   1   1     1
## 2   2   2     2
## 3   3   3     3
## 4   4   4     4
## 5   5   5     5

dt[, sort(names(dt))]

##   apple cat dog
## 1     1   1   1
## 2     2   2   2
## 3     3   3   3
## 4     4   4   4
## 5     5   5   5

##which is equivalent to 
dt[, order(names(dt))]

##   apple cat dog
## 1     1   1   1
## 2     2   2   2
## 3     3   3   3
## 4     4   4   4
## 5     5   5   5

R · Data Scince · sample(thoughts, ?)

Advanced R Exercise Solution (1)

Data Structures

Vectors

Attributes

Matrices and arrays

Data Frame

Subsetting

Data Types

Subsetting operators

Subsetting and assignment