I am reading the advanced R written by Hadley Wickham recently. I want also do the exercise this time. Not sure if I can finish all of them. Let’s see.
Data Structures
Vectors
- What are the six types of atomic vector? How does a list differ from an atomic vector?
Answer:
logical, integer, double, character, complex and raw.
sapply(list(3L, 3, TRUE, "Time", 3i, raw(2)), typeof)
## [1] "integer" "double" "logical" "character" "complex" "raw"
Lists are different from atomic vectors because their elements can be of any type, including lists.
- What makes
is.vector()
andis.numeric()
fundamentally different tois.list()
andis.character()
?
Answer:
is.vector()
does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. is.numeric()
is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.
Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors.
- Test your knowledge of vector coercion rules by predicting the output of the following uses of
c()
:
Answer:
c(1, FALSE) == c(1, 0)
## [1] TRUE TRUE
c("a", 1) == c("a", "1")
## [1] TRUE TRUE
all.equal(c(list(1), "a"), c(list(1), list("a")))
## [1] TRUE
c(TRUE, 1L) == c(1, 1)
## [1] TRUE TRUE
- Why do you need to use
unlist()
to convert a list to an atomic vector? Why doesn’tas.vector()
work?
Answer:
The elements in lists can be heterogeneous. So in order to convert a list to an atomic vector, it has to be unlisted first before the values can be coerced.
- Why is
1 == “1”
true? Why is-1 < FALSE
true? Why is“one” < 2
false?
Answer:
“one” cannot be recognized as 1 in R. The other two are because of the coercion rules.
- Why is the default missing value,
NA
, a logical vector? What’s special about logical vectors? (Hint: think aboutc(FALSE, NA_character_
).)
Answer:
Since NA
is a logical constant of length 1 which contains a missing value indicator. If we saw a NA, it means that is.na(x)
already equals to TRUE.
NA will always be coerced to the correct type if used inside c() because logical vector is in the lowest hierarchy of the coercion rule.
Attributes
- An early draft used this code to illustrate
structure()
:
structure(1:5, comment = "my attribute")
## [1] 1 2 3 4 5
Answer:
From the help page,we see that
comment {base} R Documentation Query or Set a "comment" Attribute Description These functions set and query a comment attribute for any R objects. This is typically useful for data.frames or model fits. Contrary to other attributes, the comment is not printed (by print or print.default). Assigning NULL or a zero-length character vector removes the comment.
- What happens to a factor when you modify its levels?
Answer:
f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
f1
## [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
The order of the factors itself will also change.
- What does this code do? How do f2 and f3 differ from f1?
Answer:
f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))
f2
## [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
f3
## [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
The first line of code reversed the order of the factors but does not change the levels. The second line changes the order of the levels but do not reverse the order of the factors.
Matrices and arrays
- What does
dim()
return when applied to a vector?
Answer:
dim(1:4)
## NULL
- If
is.matrix(x)
is TRUE, what willis.array(x)
return?
Answer:
is.matrix(matrix(1:9, nrow = 3))
## [1] TRUE
is.array(matrix(1:9, nrow = 3))
## [1] TRUE
- How would you describe the following three objects? What makes them different to 1:5?
Answer:
x1 <- array(1:5, c(1, 1, 5))
x2 <- array(1:5, c(1, 5, 1))
x3 <- array(1:5, c(5, 1, 1))
x1 is to make five \(1\times 1\) matrix. x2 is to make one \(1\times 5\) matrix x3 is to make one \(5\times 1\) matrix
dimension?
dim(x1)
## [1] 1 1 5
dim(1:5)
## NULL
Data Frame
- What attributes does a data frame possess?
Answer:
We can use attributes()
to find out the attributes of data frame.
names(attributes(data.frame(x = 1:3, y = c("a", "b", "c"))))
## [1] "names" "class" "row.names"
- What does
as.matrix()
do when applied to a data frame with columns of different types?
Answer:
The numeric values will be coerced to characters.
str(as.matrix(data.frame(x = 1:3, y = c("a", "b", "c"))))
## chr [1:3, 1:2] "1" "2" "3" "a" "b" "c"
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "x" "y"
If the data frame contain both logical values and numeric value, the logical values will be coerced to numeric values.
as.matrix(data.frame(x = c(T, T, F), y = c(1, 2, 3)))
## x y
## [1,] 1 1
## [2,] 1 2
## [3,] 0 3
- Can you have a data frame with 0 rows? What about 0 columns?
Answer:
Yes.
data.frame(x = character(), y = numeric())
## [1] x y
## <0 rows> (or 0-length row.names)
Subsetting
Data Types
- Fix each of the following common data frame subsetting errors:
Answer:
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]
##Corrected
mtcars[mtcars$cyl == 4, ]
mtcars[1:4, ]
mtcars[mtcars$cyl <=5, ]
mtcars[mtcars$cyl %in% c(4, 6), ]
- Why does
x <- 1:5
;x[NA]
yield five missing values? (Hint: why is it different fromx[NA_real_]
?)
Answer:
x <- 1:5
x[NA]
## [1] NA NA NA NA NA
Since NA
is a logical value so it looks for every element in the vector. NA_real_
we can consider it as just like one value. ??
What does upper.tri()
return? How does subsetting a matrix with it work? Do we need any additional subsetting rules to describe its behaviour?
Answer:
It returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower or upper triangle.
x <- outer(1:5, 1:5, FUN = "*")
upper.tri(x)
## [,1] [,2] [,3] [,4] [,5]
## [1,] FALSE TRUE TRUE TRUE TRUE
## [2,] FALSE FALSE TRUE TRUE TRUE
## [3,] FALSE FALSE FALSE TRUE TRUE
## [4,] FALSE FALSE FALSE FALSE TRUE
## [5,] FALSE FALSE FALSE FALSE FALSE
x[upper.tri(x)]
## [1] 2 3 6 4 8 12 5 10 15 20
We can not keep preserve the data structures of the matrix if we are not subsetting a complete row or column.
- Why does
mtcars[1:20]
return an error? How does it differ from the similarmtcars[1:20, ]
?
Answer:
Since there are only 11 columns in mtcars
. mtcars[1:20]
subsets on columns. mtcars[1:20, ]
subsets on rows.
- Implement your own function that extracts the diagonal entries from a matrix (it should behave like
diag(x)
where x is a matrix).
Answer:
diagnoal <- function(matrix){
dims <- dim(matrix)
i = 1
result <- c()
while(i**2 <= dims[1]**2){
result <- c(result, matrix[i, i])
i = i + 1
}
result
}
diagnoal(matrix(letters[1:16],nrow=4))
## [1] "a" "f" "k" "p"
diag(matrix(letters[1:16],nrow=4))
## [1] "a" "f" "k" "p"
- What does
df[is.na(df)] <- 0
do? How does it work?
Answer:
It replaces all NA
with 0.
df <- data.frame("a" = c(1,2, NA, NA), "a" = c(NA, NA, 1,2))
is.na(df)
## a a.1
## [1,] FALSE TRUE
## [2,] FALSE TRUE
## [3,] TRUE FALSE
## [4,] TRUE FALSE
df[is.na(df)]
## [1] NA NA NA NA
df[is.na(df)] <- 0
df
## a a.1
## 1 1 0
## 2 2 0
## 3 0 1
## 4 0 2
Subsetting operators
- Given a linear model, e.g.,
mod <- lm(mpg ~ wt, data = mtcars)
, extract the residual degrees of freedom. Extract the R squared from the model summary (summary(mod)
)
Answer:
mod <- lm(mpg ~ wt, data = mtcars)
mod$df.residual
## [1] 30
summary(mod)$r.squared
## [1] 0.7528328
Subsetting and assignment
- How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?
Answer:
dt <- data.frame("X1" = c(rep(1, 5)), "X2" = c(rep(2, 5)), "X3" = c(rep(3, 5)))
dt
## X1 X2 X3
## 1 1 2 3
## 2 1 2 3
## 3 1 2 3
## 4 1 2 3
## 5 1 2 3
dt[ ,sample(ncol(dt))]
## X1 X3 X2
## 1 1 3 2
## 2 1 3 2
## 3 1 3 2
## 4 1 3 2
## 5 1 3 2
- How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?
Answer:
dt <- data.frame("X1" = 1:10, "X2" = 1:10, "X3" = 1:10)
dt
## X1 X2 X3
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
## 6 6 6 6
## 7 7 7 7
## 8 8 8 8
## 9 9 9 9
## 10 10 10 10
sample_contiguous <- function(m){
samples <- sample(nrow(dt) - m + 1, 1)
dt[samples :(samples + m - 1), ]
}
sample_contiguous(7)
## X1 X2 X3
## 4 4 4 4
## 5 5 5 5
## 6 6 6 6
## 7 7 7 7
## 8 8 8 8
## 9 9 9 9
## 10 10 10 10
- How could you put the columns in a data frame in alphabetical order?
Answer:
dt <- data.frame("dog" = 1:5, "cat" = 1:5, "apple" = 1:5)
dt
## dog cat apple
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
dt[, sort(names(dt))]
## apple cat dog
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
##which is equivalent to
dt[, order(names(dt))]
## apple cat dog
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5