I just use the normal distribution as an example in the title. The d, p, q, r
are a family of functions for different distributions. Though we can understand the meaning of them after taking an introductory mathematical statistics course, sometimes they still look comfusing, especially for p
and q
.
Let’s start by dnorm
and rnorm
dnorm
This dnorm(x, mean = 0, sd = 1, log = FALSE)
function simply calculates the result for the value plugged into the probability density distribution or probability mass function if it is a discrete distribution.
So for the normal distribution with \(mean=0, sd=1\), we have
\[ \frac{1}{\sqrt{2\pi}}e^{\frac{-x^2}{2}} \]
If we plug \(x=2\) inside the pdf, we have
1 / sqrt(2 * pi) * exp(-2^2 / 2)
## [1] 0.05399097
which is the same as
dnorm(x = 2, mean = 0, sd = 1)
## [1] 0.05399097
rnorm
rnorm(n, mean = 0, sd = 1)
returns n random values that belong to the normal distribution with a \(mean=0\) and \(sd=1\).
For example, for \(N(0,1)\), if we generate 100 values from it. It is very unlikely that we will get a value of 10000. In fact, we are able to get the probability of having a value of 10000 from this \(N(0,1)\) distribution by using pnorm
. Mostly we will have some values are not far from 0 depending on the standard deviation. The mean and sd we got from the randomly generately values will get close to the theoretical value as n gets larger. Law of large numbers
rnorm(100, mean = 0, sd = 1)
## [1] 0.44397253 2.26974278 0.96213017 -0.98114321 -0.81673697
## [6] -1.27911926 0.43560479 0.63864735 0.70865044 1.91058328
## [11] -0.79346382 -0.74380923 0.19057270 -1.91290216 0.65753297
## [16] 0.64780687 0.85601248 0.42054690 1.59931574 1.85609449
## [21] 0.92548581 -0.63423536 0.40053807 1.11895633 0.86968372
## [26] -0.88021520 0.69891915 -1.13410683 0.42440412 -1.54164780
## [31] -1.41371545 -0.77129951 0.58247868 -0.60981978 1.61671347
## [36] -0.19234311 -0.43230939 -1.69311707 1.28331089 -0.43960770
## [41] -1.26880188 -1.03024181 -0.09301054 0.09630727 0.09567935
## [46] -0.95457462 0.15968128 1.59552431 0.70149448 0.59702470
## [51] -0.79018483 -0.46857261 1.33755335 -0.99504568 0.05257650
## [56] -1.56017586 -1.09044670 3.44503337 -0.67710208 -0.65193628
## [61] 0.51748999 -0.64310700 -0.98015442 2.08505345 -0.03036714
## [66] -0.74714762 -0.56065081 1.69428481 0.87185800 -0.24940924
## [71] 2.89343687 -1.29225632 -0.07762765 0.78040052 0.54147203
## [76] 0.77056421 1.56432169 3.31402743 0.01087223 -0.54794083
## [81] 1.23263952 0.88385819 -0.05748334 0.80355828 1.25799155
## [86] -1.75506811 -0.35114983 -1.24856268 0.58143097 -0.16829024
## [91] 0.19160874 1.31997751 0.77578134 2.62336213 -0.22477977
## [96] 1.61008297 -1.08468341 -2.25972128 1.11632542 -1.15402003
mean(rnorm(100, mean = 0, sd = 1))
## [1] 0.001469718
pnorm
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
returns the probabality of \(p(X<=x)\) by default. If we set low.tail = FALSE
, then it returns \(p(X>x)=1-p(X<=x)\).
Let’s look at an extreme example which is the one I mentioned above. What is the probability that \(p(X<10000)\) for \(N(0,1)\). It is almost certainly that it should be 1. In another word, \(p(x>10000)\) is 0. You can imagine the chance of having a human being whose height is 40m(ultraman).
It is important to remember the function returns probability.
pnorm(0, mean = 0, sd = 1)
## [1] 0.5
pnorm(10000, mean = 0, sd = 1)
## [1] 1
qnorm
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
is the inverse of pnorm
, so the parameter p
inside the qnorm need to be within \([0,1]\).
qnorm(0.999, mean=0, sd=1, lower.tail = TRUE)
## [1] 3.090232
pnorm(3.090232, mean=0, sd=1, lower.tail = TRUE)
## [1] 0.999