I am reading the advanced R written by Hadley Wickham recently. I want also do the exercise this time. Not sure if I can finish all of them. Let’s see.

## Data Structures

### Vectors

- What are the six types of atomic vector? How does a list differ from an atomic vector?

**Answer:**

logical, integer, double, character, complex and raw.

`sapply(list(3L, 3, TRUE, "Time", 3i, raw(2)), typeof)`

`## [1] "integer" "double" "logical" "character" "complex" "raw"`

Lists are different from atomic vectors because their elements can be of any type, including lists.

- What makes
`is.vector()`

and`is.numeric()`

fundamentally different to`is.list()`

and`is.character()`

?

**Answer:**

`is.vector()`

does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. `is.numeric()`

is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.

Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from atomic vectors.

- Test your knowledge of vector coercion rules by predicting the output of the following uses of
`c()`

:

**Answer:**

`c(1, FALSE) == c(1, 0)`

`## [1] TRUE TRUE`

`c("a", 1) == c("a", "1")`

`## [1] TRUE TRUE`

`all.equal(c(list(1), "a"), c(list(1), list("a")))`

`## [1] TRUE`

`c(TRUE, 1L) == c(1, 1)`

`## [1] TRUE TRUE`

- Why do you need to use
`unlist()`

to convert a list to an atomic vector? Why doesn’t`as.vector()`

work?

**Answer:**

The elements in lists can be heterogeneous. So in order to convert a list to an atomic vector, it has to be unlisted first before the values can be coerced.

- Why is
`1 == “1”`

true? Why is`-1 < FALSE`

true? Why is`“one” < 2`

false?

**Answer:**

“one” cannot be recognized as 1 in R. The other two are because of the coercion rules.

- Why is the default missing value,
`NA`

, a logical vector? What’s special about logical vectors? (Hint: think about`c(FALSE, NA_character_`

).)

**Answer:**

Since `NA`

is a logical constant of length 1 which contains a missing value indicator. If we saw a NA, it means that `is.na(x)`

already equals to TRUE.

NA will always be coerced to the correct type if used inside c() because logical vector is in the lowest hierarchy of the coercion rule.

### Attributes

- An early draft used this code to illustrate
`structure()`

:

`structure(1:5, comment = "my attribute")`

`## [1] 1 2 3 4 5`

**Answer:**

From the help page,we see that

comment {base} R Documentation Query or Set a "comment" Attribute Description These functions set and query a comment attribute for any R objects. This is typically useful for data.frames or model fits. Contrary to other attributes, the comment is not printed (by print or print.default). Assigning NULL or a zero-length character vector removes the comment.

- What happens to a factor when you modify its levels?

**Answer:**

```
f1 <- factor(letters)
levels(f1) <- rev(levels(f1))
f1
```

```
## [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
```

The order of the factors itself will also change.

- What does this code do? How do f2 and f3 differ from f1?

**Answer:**

```
f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))
f2
```

```
## [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
```

`f3`

```
## [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
```

The first line of code reversed the order of the factors but does not change the levels. The second line changes the order of the levels but do not reverse the order of the factors.

### Matrices and arrays

- What does
`dim()`

return when applied to a vector?

**Answer:**

`dim(1:4)`

`## NULL`

- If
`is.matrix(x)`

is TRUE, what will`is.array(x)`

return?

**Answer:**

`is.matrix(matrix(1:9, nrow = 3))`

`## [1] TRUE`

`is.array(matrix(1:9, nrow = 3))`

`## [1] TRUE`

- How would you describe the following three objects? What makes them different to 1:5?

**Answer:**

```
x1 <- array(1:5, c(1, 1, 5))
x2 <- array(1:5, c(1, 5, 1))
x3 <- array(1:5, c(5, 1, 1))
```

x1 is to make five \(1\times 1\) matrix. x2 is to make one \(1\times 5\) matrix x3 is to make one \(5\times 1\) matrix

dimension?

`dim(x1)`

`## [1] 1 1 5`

`dim(1:5)`

`## NULL`

### Data Frame

- What attributes does a data frame possess?

**Answer:**

We can use `attributes()`

to find out the attributes of data frame.

`names(attributes(data.frame(x = 1:3, y = c("a", "b", "c"))))`

`## [1] "names" "class" "row.names"`

- What does
`as.matrix()`

do when applied to a data frame with columns of different types?

**Answer:**

The numeric values will be coerced to characters.

`str(as.matrix(data.frame(x = 1:3, y = c("a", "b", "c"))))`

```
## chr [1:3, 1:2] "1" "2" "3" "a" "b" "c"
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:2] "x" "y"
```

If the data frame contain both logical values and numeric value, the logical values will be coerced to numeric values.

`as.matrix(data.frame(x = c(T, T, F), y = c(1, 2, 3)))`

```
## x y
## [1,] 1 1
## [2,] 1 2
## [3,] 0 3
```

- Can you have a data frame with 0 rows? What about 0 columns?

**Answer:**

Yes.

`data.frame(x = character(), y = numeric())`

```
## [1] x y
## <0 rows> (or 0-length row.names)
```

## Subsetting

### Data Types

- Fix each of the following common data frame subsetting errors:

**Answer:**

```
mtcars[mtcars$cyl = 4, ]
mtcars[-1:4, ]
mtcars[mtcars$cyl <= 5]
mtcars[mtcars$cyl == 4 | 6, ]
##Corrected
mtcars[mtcars$cyl == 4, ]
mtcars[1:4, ]
mtcars[mtcars$cyl <=5, ]
mtcars[mtcars$cyl %in% c(4, 6), ]
```

- Why does
`x <- 1:5`

;`x[NA]`

yield five missing values? (Hint: why is it different from`x[NA_real_]`

?)

**Answer:**

```
x <- 1:5
x[NA]
```

`## [1] NA NA NA NA NA`

Since `NA`

is a logical value so it looks for every element in the vector. `NA_real_`

we can consider it as just like one value. ??

What does `upper.tri()`

return? How does subsetting a matrix with it work? Do we need any additional subsetting rules to describe its behaviour?

**Answer:**

It returns a matrix of logicals the same size of a given matrix with entries TRUE in the lower or upper triangle.

```
x <- outer(1:5, 1:5, FUN = "*")
upper.tri(x)
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] FALSE TRUE TRUE TRUE TRUE
## [2,] FALSE FALSE TRUE TRUE TRUE
## [3,] FALSE FALSE FALSE TRUE TRUE
## [4,] FALSE FALSE FALSE FALSE TRUE
## [5,] FALSE FALSE FALSE FALSE FALSE
```

`x[upper.tri(x)]`

`## [1] 2 3 6 4 8 12 5 10 15 20`

We can not keep preserve the data structures of the matrix if we are not subsetting a complete row or column.

- Why does
`mtcars[1:20]`

return an error? How does it differ from the similar`mtcars[1:20, ]`

?

**Answer:**

Since there are only 11 columns in `mtcars`

. `mtcars[1:20]`

subsets on columns. `mtcars[1:20, ]`

subsets on rows.

- Implement your own function that extracts the diagonal entries from a matrix (it should behave like
`diag(x)`

where x is a matrix).

**Answer:**

```
diagnoal <- function(matrix){
dims <- dim(matrix)
i = 1
result <- c()
while(i**2 <= dims[1]**2){
result <- c(result, matrix[i, i])
i = i + 1
}
result
}
diagnoal(matrix(letters[1:16],nrow=4))
```

`## [1] "a" "f" "k" "p"`

`diag(matrix(letters[1:16],nrow=4))`

`## [1] "a" "f" "k" "p"`

- What does
`df[is.na(df)] <- 0`

do? How does it work?

**Answer:**

It replaces all `NA`

with 0.

```
df <- data.frame("a" = c(1,2, NA, NA), "a" = c(NA, NA, 1,2))
is.na(df)
```

```
## a a.1
## [1,] FALSE TRUE
## [2,] FALSE TRUE
## [3,] TRUE FALSE
## [4,] TRUE FALSE
```

`df[is.na(df)] `

`## [1] NA NA NA NA`

```
df[is.na(df)] <- 0
df
```

```
## a a.1
## 1 1 0
## 2 2 0
## 3 0 1
## 4 0 2
```

### Subsetting operators

- Given a linear model, e.g.,
`mod <- lm(mpg ~ wt, data = mtcars)`

, extract the residual degrees of freedom. Extract the R squared from the model summary (`summary(mod)`

)

**Answer:**

```
mod <- lm(mpg ~ wt, data = mtcars)
mod$df.residual
```

`## [1] 30`

`summary(mod)$r.squared`

`## [1] 0.7528328`

### Subsetting and assignment

- How would you randomly permute the columns of a data frame? (This is an important technique in random forests.) Can you simultaneously permute the rows and columns in one step?

**Answer:**

```
dt <- data.frame("X1" = c(rep(1, 5)), "X2" = c(rep(2, 5)), "X3" = c(rep(3, 5)))
dt
```

```
## X1 X2 X3
## 1 1 2 3
## 2 1 2 3
## 3 1 2 3
## 4 1 2 3
## 5 1 2 3
```

`dt[ ,sample(ncol(dt))]`

```
## X1 X3 X2
## 1 1 3 2
## 2 1 3 2
## 3 1 3 2
## 4 1 3 2
## 5 1 3 2
```

- How would you select a random sample of m rows from a data frame? What if the sample had to be contiguous (i.e., with an initial row, a final row, and every row in between)?

**Answer:**

```
dt <- data.frame("X1" = 1:10, "X2" = 1:10, "X3" = 1:10)
dt
```

```
## X1 X2 X3
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
## 6 6 6 6
## 7 7 7 7
## 8 8 8 8
## 9 9 9 9
## 10 10 10 10
```

```
sample_contiguous <- function(m){
samples <- sample(nrow(dt) - m + 1, 1)
dt[samples :(samples + m - 1), ]
}
sample_contiguous(7)
```

```
## X1 X2 X3
## 4 4 4 4
## 5 5 5 5
## 6 6 6 6
## 7 7 7 7
## 8 8 8 8
## 9 9 9 9
## 10 10 10 10
```

- How could you put the columns in a data frame in alphabetical order?

**Answer:**

```
dt <- data.frame("dog" = 1:5, "cat" = 1:5, "apple" = 1:5)
dt
```

```
## dog cat apple
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
```

`dt[, sort(names(dt))]`

```
## apple cat dog
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
```

```
##which is equivalent to
dt[, order(names(dt))]
```

```
## apple cat dog
## 1 1 1 1
## 2 2 2 2
## 3 3 3 3
## 4 4 4 4
## 5 5 5 5
```