User:Physikerwelt/test/Birthday problem

From Wikipedia, the free encyclopedia

Calculating the probability[edit]

The problem is to compute the approximate probability that in a group of n people, at least two have the same birthday. For simplicity, disregard variations in the distribution, such as leap years, twins, seasonal or weekday variations, and assume that the 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely.[1]

The goal is to compute P(A), the probability that at least two people in the room have the same birthday. However, it is simpler to calculate P(A'), the probability that no two people in the room have the same birthday. Then, because A and A' are the only two possibilities and are also mutually exclusive, P(A) = 1 − P(A').

In deference to widely published solutions concluding that 23 is the minimum number of people necessary to have a P(A) that is greater than 50%, the following calculation of P(A) will use 23 people as an example.

When events are independent of each other, the probability of all of the events occurring is equal to a product of the probabilities of each of the events occurring. Therefore, if P(A') can be described as 23 independent events, P(A') could be calculated as P(1) × P(2) × P(3) × ... × P(23).

The 23 independent events correspond to the 23 people, and can be defined in order. Each event can be defined as the corresponding person not sharing his/her birthday with any of the previously analyzed people. For Event 1, there are no previously analyzed people. Therefore, the probability, P(1), that Person 1 does not share his/her birthday with previously analyzed people is 1, or 100%. Ignoring leap years for this analysis, the probability of 1 can also be written as 365/365, for reasons that will become clear below.

For Event 2, the only previously analyzed people are Person 1. Assuming that birthdays are equally likely to happen on each of the 365 days of the year, the probability, P(2), that Person 2 has a different birthday than Person 1 is 364/365. This is because, if Person 2 was born on any of the other 364 days of the year, Persons 1 and 2 will not share the same birthday.

Similarly, if Person 3 is born on any of the 363 days of the year other than the birthdays of Persons 1 and 2, Person 3 will not share their birthday. This makes the probability P(3) = 363/365.

This analysis continues until Person 23 is reached, whose probability of not sharing his/her birthday with people analyzed before, P(23), is 343/365.

P(A') is equal to the product of these individual probabilities:

The terms of equation (1) can be collected to arrive at:

Evaluating equation (2) gives P(A') ≈ 0.492703

Therefore, P(A) ≈ 1 − 0.492703 = 0.507297 (50.7297%)

This process can be generalized to a group of n people, where p(n) is the probability of at least two of the n people sharing a birthday. It is easier to first calculate the probability p(n) that all n birthdays are different. According to the pigeonhole principle, p(n) is zero when n > 365. When n ≤ 365:

where '!' is the factorial operator, is the binomial coefficient and denotes permutation.

The equation expresses the fact that the first person has no one to share a birthday, the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), and in general the nth birthday cannot be the same as any of the n − 1 preceding birthdays.

The event of at least two of the n persons having the same birthday is complementary to all n birthdays being different. Therefore, its probability p(n) is

This probability surpasses 1/2 for n = 23 (with value about 50.7%). The following table shows the probability for some other values of n (this table ignores the existence of leap years, as described above, as well as assuming that each birthday is equally likely):

The probability that no two people share a birthday in a group of n people. Note that the vertical scale is logarithmic (each step down is 1020 times less likely).
n p(n)
1 0.0%
5 2.7%
10 11.7%
20 41.1%
23 50.7%
30 70.6%
40 89.1%
50 97.0%
60 99.4%
70 99.9%
100 99.99997%
200 99.9999999999999999999999999998%
300 (100 − (6×10−80))%
350 (100 − (3×10−129))%
365 (100 − (1.45×10−155))%
366 100%
367 100%
  1. ^ Cite error: The named reference nonuniform birthdays was invoked but never defined (see the help page).