R Programming & Statistics Notes

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of $n$ randomly chosen people, some pair of them will have the same birthday.

By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are only 366 possible birthdays, including February 29). However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (excluding February 29) is equally probable for a birthday.

It may seem surprising that the probability is above 50% when there is a pair with the same birthday for a group as small as 23 individuals. This is made more plausible when considering that a comparison will actually be made between every possible pair of people rather than fixing one individual and comparing him or her solely to the rest of the group.

Real-world applications for the birthday paradox include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function.

The history of the problem is obscure. W. W. Rouse Ball indicated (without citation) that it was first discussed by Harold Davenport.^[1] However, Richard von Mises proposed an earlier version of what is considered today to be the birthday problem.^[2] The problem was featured by Martin Gardner in his April 1957 "Mathematical Games" column in Scientific American.

The computed probability of at least two people sharing a birthday versus the number of people

Calculating the probability

The problem is to compute an approximate probability that in a group of $n$ people at least two have the same birthday. For simplicity, variations in the distribution, such as leap years, twins, seasonal or weekday variations are disregarded, and it is assumed that all 365 possible birthdays are equally likely. (Real-life birthday distributions are not uniform, since not all dates are equally likely, but these irregularities have little effect on the analysis.^{[nb 1]})

The goal is to compute $P (A)$ , the probability that at least two people in the room have the same birthday. However, it is simpler to calculate $P (A')$ , the probability that no two people in the room have the same birthday. Then, because $A$ and $A'$ are the only two possibilities and are also mutually exclusive, $P (A) = 1 - P (A').$

In deference to widely published solutions concluding that 23 is the minimum number of people necessary to have a $P (A)$ that is greater than 50%, the following calculation of $P (A)$ will use 23 people as an example. If one numbers the 23 people from 1 to 23, the event that all 23 people have different birthdays is the same as the event that person 2 does not have the same birthday as person 1, and that person 3 does not have the same birthday as either person 1 or person 2, and so on, and finally that person 23 does not have the same birthday as any of persons 1 through 22. Let these events respectively be called "Event 2", "Event 3", and so on. One may also add an "Event 1", corresponding to the event of person 1 having a birthday, which occurs with probability 1. This conjunction of events may be computed using conditional probability: the probability of Event 2 is 364/365, as person 2 may have any birthday other than the birthday of person 1. Similarly, the probability of Event 3 given that Event 2 occurred is 363/365, as person 3 may have any of the birthdays not already taken by persons 1 and 2. This continues until finally the probability of Event 23 given that all preceding events occurred is 343/365. Finally, the principle of conditional probability implies that $P (A')$ is equal to the product of these individual probabilities:

$P(A')={\frac {365}{365}}\times {\frac {364}{365}}\times {\frac {363}{365}}\times {\frac {362}{365}}\times \cdots \times {\frac {343}{365}}$

The terms of equation (1) can be collected to arrive at:

$P(A')=\left({\frac {1}{365}}\right)^{23}\times (365\times 364\times 363\times \cdots \times 343)$

(2)

Evaluating equation (2) gives $P (A') \approx 0.492703$

Therefore, $P (A) \approx 1 - 0.492703 = 0.507297$ (50.7297%).

This process can be generalized to a group of $n$ people, where $p (n)$ is the probability of at least two of the $n$ people sharing a birthday. It is easier to first calculate the probability $p (n)$ that all $n$ birthdays are different. According to the pigeonhole principle, $p (n)$ is zero when $n > 365$ . When $n \leq 365$ :

{\begin{aligned}{\bar {p}}(n)&=1\times \left(1-{\frac {1}{365}}\right)\times \left(1-{\frac {2}{365}}\right)\times \cdots \times \left(1-{\frac {n-1}{365}}\right)\\&={\frac {365\times 364\times \cdots \times (365-n+1)}{365^{n}}}\\&={\frac {365!}{365^{n}(365-n)!}}={\frac {n!\cdot {\binom {365}{n}}}{365^{n}}}={\frac {_{365}P_{n}}{365^{n}}}\end{aligned}}

where $!$ is the factorial operator, $(365 n)$ is the binomial coefficient and $k P r$ denotes permutation.

The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as either of the first two (363/365), and in general the $n$ th birthday cannot be the same as any of the $n - 1$ preceding birthdays.

The event of at least two of the $n$ persons having the same birthday is complementary to all $n$ birthdays being different. Therefore, its probability $p (n)$ is

p(n)=1-{\bar {p}}(n).

The following table shows the probability for some other values of

n

(this table ignores the existence of leap years, as described above, as well as assuming that each birthday is equally likely):

$n$	$p (n)$
1	0.0%
5	2.7%
10	11.7%
20	41.1%
23	50.7%
30	70.6%
40	89.1%
50	97.0%
60	99.4%
70	99.9%
75	99.97%
100	99.99997%
200	99.9999999999999999999999999998%
300	(100 − 6×10⁻⁸⁰)%
350	(100 − 3×10⁻¹²⁹)%
365	(100 − 1.45×10⁻¹⁵⁵)%
366	100%
367	100%

[1]

[2]

[nb 1]

Chapter 1.1.2 Birthday Problem

Calculating the probability