Chapter 1.2.5.2 Sampling Distributions

We describe t, χ2 and F in the context of samples from normal rv.

Emphasis is on understanding relationship between the rv and how it comes about from and S.

The goal is for you not to be surprised when we say later that test statistics have certain distributions. You may not be able to prove it, but hopefully won’t be surprised either. We’ll use density estimation extensively to illustrate the relationships.

Given X1,…,Xn, which we assume in this section are independent and identically distributed with mean μ and standard deviation σ, we define the sample mean

X¯ = ∑i=1n Xi

and the sample standard deviation

S^2 = 1/(n−1) ∑i=1n (Xi−X¯)^2

We note that is an estimator for μ and S is an estimator for σ.

Summary

Linear Combination of Normal RVs

Point Estimators

If X is a random variable that depends on parameter β, then often we are interested in estimating the value of β from a random sample X1,…,Xn from X. For example, if X is a normal rv with unknown mean μ, and we get data X1,…,Xn, it would be very natural to estimate μ by 1/n∑i=1nXi.

Notationally, we say

In other words, our estimate of β from the data x1,…,xn is denoted by β^.

The two point estimators that we are interested in at this point are point estimators for the mean of an rv and the variance of an rv. (Note that we have switched gears a little bit, because the mean and variance aren’t necessarily parameters of the rv. We are really estimating functions of the parameters at this point. The distinction will not be important for the purposes of this text.) Recall that the definition of the mean of a discrete rv with possible values xi is

If we are given data x1,…,xn, it is natural to assign p(xi)=1/n for each of those data points to create a new random variable X0, and we obtain

This is a natural choice for our point estimator for the mean of the rv X, as well.

Continuing in the same way, if we are given data x1,…,xn and we wish to estimate the variance, then we could create a new random variable X0 whose possible outcomes are x1,…,xn with probabilities 1/n, and compute

This works just fine as long as μ is known. However, most of the time, we do not know μ and we must replace the true value of μ with our estimate from the data.

There is a heuristic that each time you replace a parameter with an estimate of that parameter, you divide by one less. Following that heuristic, we obtain

Properties of Point Estimators

Note that the point estimators for μ and σ^2 can be thought of as random variables themselves, since they are combinations of a random sample from a distribution. As such, they also have distributions, means and variances.

One property of point estimators that is often desirable is that it is unbiased. We say that a point estimator β^ for β is unbiased if E[β^]=β. (Recall: β^ is a random variable, so we can take its expected value!) Intuitively, unbiased means that the estimator does not consistently underestimate or overestimate the parameter it is estimating. If were to estimate the parameter over and over again, the average value would converge to the correct value of the parameter.

Let’s consider and S^2. Are they unbiased? It is possible to determine this analytically, and interested students should consult Wackerly et al for a proof. However, we will do this using R and simulation.

Variance of Unbiased Estimators

From the above, we can see that and S^2 are unbiased estimators for μ and σ^2, respectively. There are other unbiased estimator for the mean and variance, however.

For example, if X1,…,Xn is a random sample from a normal rv, then the median of the Xi is also an unbiased estimator for the mean. Moreover, the median seems like a perfectly reasonable thing to use to estimate μ, and in many cases is actually preferred to the mean.

There is one way, however, in which the sample mean is definitely better than the median, and that is that it has a lower variance. So, in theory at least, should not deviate from the true mean as much as the median will deviate from the true mean, as measured by variance.

Sample Variance

In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population.[8] Sample variance can also be applied to the estimation of the variance of a continuous distribution from a sample of that distribution.

We take a sample with replacement of n values y1, ..., yn from the population, where n < N, and estimate the variance on the basis of this sample.[9] Directly taking the variance of the sample data gives the average of the squared deviations:

Here, denotes the sample mean:

Since the yi are selected randomly, both and are random variables. Their expected values can be evaluated by averaging over the ensemble of all possible samples {yi} of size n from the population. For this gives:

Hence gives an estimate of the population variance that is biased by a factor of . For this reason, is referred to as the biased sample variance. Correcting for this bias yields the unbiased sample variance:

Either estimator may be simply referred to as the sample variance when the version can be determined by context. The same proof is also applicable for samples taken from a continuous probability distribution.

The use of the term n − 1 is called Bessel's correction, and it is also used in sample covariance and the sample standard deviation (the square root of variance). The square root is a concave function and thus introduces negative bias (by Jensen's inequality), which depends on the distribution, and thus the corrected sample standard deviation (using Bessel's correction) is biased. The unbiased estimation of standard deviation is a technically involved problem, though for the normal distribution using the term n − 1.5 yields an almost unbiased estimator.

The unbiased sample variance is a U-statistic for the function ƒ(y1, y2) = (y1y2)2/2, meaning that it is obtained by averaging a 2-sample statistic over 2-element subsets of the population.

Distribution of the sample variance

Distribution and cumulative distribution of s22, for various values of ν = n − 1, when the yi are independent normally distributed.

Being a function of random variables, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that yi are independent observations from a normal distribution, Cochran's theorem shows that s2 follows a scaled chi-squared distribution:[10]

As a direct consequence, it follows that

and[11]

If the yi are independent and identically distributed, but not necessarily normally distributed, then[12][13]

where κ is the kurtosis of the distribution and μ4 is the fourth central moment.

If the conditions of the law of large numbers hold for the squared observations, s2 is a consistent estimator of σ2. One can see indeed that the variance of the estimator tends asymptotically to zero. An asymptotically equivalent formula was given in Kenney and Keeping (1951:164), Rose and Smith (2002:264), and Weisstein (n.d.).[14][15][16]

Chi-squared

Let Z be a standard normal(0,1) random variable.

An rv with the same distribution Z^2 is called a Chi-squared rv with one degree of freedom. The sum of n independent χ2 rv’s with 1 degree of freedom is a chi-squared rv with n degrees of freedom.

In particular, the sum of a χ2 with ν1 degrees of freedom and a χ2 with ν2 degrees of freedom is χ2 with ν1+ν2 degrees of freedom.

sdData <- replicate(10000, 3/81 * sd(rnorm(4, 3, 9))^2)
f <- function(x) dchisq(x, df = 3)
ggplot(data.frame(sdData), aes(x = sdData)) + 
  geom_density() +
  stat_function(fun = f, color = "red")

Theorem
Let X1,…,Xn be iid normal random variables. Then,
(n−1/σ^2)*S^2
has a χ2 distribution with n−1 degrees of freedom.

F distribution

An F distribution has the same density function as

Fν1,ν2=χν1^2/ν1 / χν2^2/ν2

We say F has ν1 numerator degrees of freedom and ν2 denominator degrees of freedom.

One example of this type is:

S1^2/σ1^2 / S2^2/σ2^2

where X1,…,Xn1 are iid normal with standard deviation σ1

and Y1,…,Yn2 are iid normal with standard deviation σ2.

t distribution

If Z is a standard normal(0,1) rv, χν2 is a Chi-squared rv with ν degrees of freedom, and Z and χν2 are independent, then

is distributed as a t random variable with ν degrees of freedom.

Theorem

If X1,…,Xn are iid normal rvs with mean μ and sd σ, then

is t with n−1 degrees of freedom.

Note that X¯−μ/ σ/sqrt{n} is normal(0,1). So, replacing σ with S changes the distribution from normal to t.

R as a set of statistical tables

One convenient use of R is to provide a comprehensive set of statistical tables.

Functions are provided to evaluate the cumulative distribution function P(X <= x), the probability density function and the quantile function (given q, the smallest x such that P(X <= x) > q), and to simulate from the distribution.

Distribution R name additional arguments
beta beta shape1, shape2, ncp
binomial binom size, prob
Cauchy cauchy location, scale
chi-squared chisq df, ncp
exponential exp rate
F f df1, df2, ncp
gamma gamma shape, scale
geometric geom prob
hypergeometric hyper m, n, k
log-normal lnorm meanlog, sdlog
logistic logis location, scale
negative binomial nbinom size, prob
normal norm mean, sd
Poisson pois lambda
signed rank signrank n
Student’s t t df, ncp
uniform unif min, max
Weibull weibull shape, scale
Wilcoxon wilcox

Prefix the name given here by ‘d’ for the density, ‘p’ for the CDF, ‘q’ for the quantile function and ‘r’ for simulation (random deviates). The first argument is x for dxxx, q for pxxx, p for qxxx and n for rxxx (except for rhyper, rsignrank and rwilcox, for which it is nn). In not quite all cases is the non-centrality parameter ncp currently available: see the on-line help for details.

The pxxx and qxxx functi overridden.ons all have logical arguments lower.tail and log.p and the dxxx ones have log. This allows, e.g., getting the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by

 - pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)

or more accurate log-likelihoods (by dxxx(..., log = TRUE)), directly.

In addition there are functions ptukey and qtukey for the distribution of the studentized range of samples from a normal distribution, and dmultinom and rmultinom for the multinomial distribution. Further distributions are available in contributed packages, notably SuppDists.

See the on-line help on RNG for how random-number generation is done in R.