Chapter 1.1 Probability Basics

A primary goal of statistics is to describe the real world based on limited observations.

These observations may be influenced by random factors, such as measurement error or environmental conditions. This chapter introduces probability, which is designed to describe random events. Later, we will see that the theory of probability is so powerful that we intentionally introduce randomness into experiments and studies so we can make precise statements from data.

Definitions

An experiment is a process that produces an observation.
An outcome is a possible observation
The set of all possible outcomes is called the sample space
An event is a subset of the sample space.

Example Roll a die and observe the number of dots on the top face. This is an experiment, with six possible outcomes.

The sample space is the set $S = {1, 2, 3, 4, 5, 6}$ .

The event “roll higher than 3” is the set ${4, 5, 6}$ .

Example Stop a random person on the street and ask them what month they were born. This experiment has the twelve months of the year as possible outcomes. An example of an event $E$ might be that they were born in a summer month, $E = {J u n e, J u l y, A u g u s t}$

Example Suppose a traffic light stays red for 90 seconds each cycle. While driving you arrive at this light, and observe the amount of time until the light turns green. The sample space is the interval of real numbers $[0, 90]$ . The event “you didn’t have to wait” is the set ${0}$ .

Definition #1

The probability of an event

E

is a number

P (E)

between 0 and 1 (inclusive), so

0 \leq P (E) \leq 1

Events are a fundamental concept in probability. Therefore, it is important to know some basic set theory, which we summarize in the following definition.

Definition #2

Let $A$ and $B$ be events in a sample space $S$ .

$A \cap B$ is the set of outcomes that are in both $A$ and $B$ .
$A \cup B$ is the set of outcomes that are in either $A$ or $B$ (or both).
$¯ ¯¯ ¯ A$ is the set of outcomes that are not in $A$ (but are in $S$ ).
$A ∖ B$ is the set of outcomes that are in $A$ and not in $B$ .

Probabilities obey some important rules:

Theorem

Let $A$ , $B$ and $C$ be events in the sample space $S$ .

$P (A \cup B) = P (A) + P (B) - P (A \cap B)$
$P (A) = 1 - P (¯ ¯¯ ¯ A)$ , where $¯ ¯¯ ¯ A = S ∖ A$ .
If $A$ and $B$ are disjoint, then $P (A \cup B) = P (A) + P (B)$ .
$P (S) = 1$ .

One way to assign probabilities to events is empirically, by repeating the experiment many times and observing the proportion of times the event occurs. While this can only approximate the true probability, it is sometimes the only approach possible. For example, in the United States the probability of being born in October is noticeably higher than the probability of being born in January, and these values can only be estimated by observing actual patterns of human births.

Another method is to make an assumption that all outcomes are equally likely, usually because of some physical property of the experiment. For example, because (high quality) dice are close to perfect cubes, one believes that all six sides of a dice are equally likely to occur. Using the additivity of disjoint events (rule 3 in the theorem above),

$P ({1}) + P ({2}) + P ({3}) + P ({4}) + P ({5}) + P ({6}) =$

$P ({1, 2, 3, 4, 5, 6}) =1$

Since all six probabilities are equal and sum to 1, the probability of each face occurring is $1 / 6$ . In this case, the probability of an event $E$ can be computed by counting the number of elements in $E$ and dividing by the number of elements in $S$ .

Example

Suppose that two six-sided dice are rolled and the numbers appearing on the dice are observed. The sample space, $S$ , is given by

$⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6) (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6) (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6) (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6) (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6) (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6) \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠$

By the symmetry of the dice, we expect all 36 possible outcomes to be equally likely. So the probability of each outcome is $1 / 36$ .

The event “The sum of the dice is 6” is represented by

$E = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}$

The probability that the sum of two dice is 6 is given by $P (E) = \frac{| E |}{| S |} = \frac{5}{36},$ which can be obtained by simply counting the number of elements in each set above.

Conditional Probability and Independence

Sometimes when considering multiple events, we have information about one of the events. If you roll two dice and one of them falls off of the table, while the other one shows a 6, that gives you a lot of information about whether the sum of the dice will be 4! Conditional probability formalizes this idea.

Definition #1

The conditional probability of

A

given

B

P(A|B)=P(A\capB) / P(B)

Example What is the probability that both dice are 4, given that the sum of two dice is 8?

Solution: Let $A$ be the event “both dice are 4” and $B$ be the event “the sum is 8”. Then, $P (A | B) = P (A \cap B) / P (B) = \frac{1 / 36 /}{5 / 36} = 1 / 5.$ Note that this is the hardest way to get an 8; the probability that one of the dice is 3 and the other is 5 is 2/5.

Definition #2

Two events are said to be independent if knowledge that one event occurs doesn’t give any probabilistic information as to whether the other event occurs.

Formally, we say that

A

and

B

are independent if

P (A | B) = P (A)

or, equivalently, that

P (B | A) = P (B)

Theorem

(The multiplication rule for independent events) If

A

and

B

are independent, then

P (A \cap B) = P (A) P (B)

The multiplication rule is often used as the definition of independence, because it works even in the case when $P (A) = 0$ or $P (B) = 0$ . However, the intuition is clearer in the conditional probability definition.

Example Two dice are rolled. Let $A$ be the event “The first die is a 5”, let $B$ be the event “The sum of the dice is 7”, and let $C$ be the event “The sum of the dice is 8.” Show that $A$ and $B$ are independent, but $A$ and $C$ are dependent.

Note that $P (B) = 6 / 36 = 1 / 6$ .

Now, $P (B | A) = P (A \cap B) / P (A) = \frac{1 / 36/}{1 / 6} = 1 / 6$ .

Therefore, $A$ and $B$ are independent.

However, $P (C) = 5 / 36$ , which is not the same as $P (C | A) = \frac{1 / 36}{1 / 6} = 1 / 6$ . Therefore, $A$ and $C$ are not independent.

We note here that $B$ and $C$ are also not independent.

Counting Arguments

Using probability rules, one can often convert the computation of a probability to a problem of counting elements in a set. This leads to many problems which fall under the general title of enumerative combinatorics, a large and interesting field of mathematics. The interested reader should see ??? for some examples of this type of reasoning.

In this book, however, we will only use a few basic type of counting arguments.

Proposition 2.1 (Rule of product) If there are

m

ways to do something, and

n

ways to do another thing, then there are

m \times n

ways to do both things

Proposition 2.2 (Combinations) The number of ways of choosing

k

distinct objects from a set of

n

is given by

(\frac{n}{k}) = \frac{n!}{k! (n - k)!}

The R command for computing $(\frac{10}{3})$ is choose(10,3).

Example

A coin is tossed 10 times. Some possible outcomes are HHHHHHHHHH, HTHTHTHTHT, and HHTHTTHTTT. Since each toss has two possibilities, the rule of product says that there are $2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 \cdot 2 = 2^{10} = 1024$ possible outcomes for the experiment. We expect each possible outcome to be equally likely, so the probability of any single outcome is 1/1024.

Let $E$ be the event “We flipped exactly three heads”. This might happen as the sequence HHHTTTTTTT, or TTTHTHTTHT, or many other ways. What is $P (E)$ ? To compute the probability, we need to count the number of possible ways that three heads may appear. Since the three heads may appear in any of the ten slots, the answer is

$| E | = (\frac{10}{3}) = \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 120.$

Then $P (E) = 120 / 1024 \approx 0.117$ .