Talk:Birthday problem/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Old Discussion

I think the birthday paradox is one of those problems where you have to be careful not to run into limited precision floating point problems, but I'm not sure. Anyone know? Martin

The problem doesn't seem to be overly sensitive to limited precision. Here's a bc program:

for (scale = 1; scale<10; scale ++){
  prod = 1
  for (i=1; prod>0.5; i++) {
    prod = prod*((365-i)/365)
  }
  print "scale=",scale," i=",i, " prod=", prod, "\n"
}

(in bc, the variable scale is the number of digits used after the decimal point). The output is

scale=1 i=6 prod=.5
scale=2 i=20 prod=.48
scale=3 i=23 prod=.481
scale=4 i=23 prod=.4915
scale=5 i=23 prod=.49260
scale=6 i=23 prod=.492690
scale=7 i=23 prod=.4927016
scale=8 i=23 prod=.49270264
scale=9 i=23 prod=.492702751

which means that the correct answer i=23 is produced already with three-decimal-digit precision. The results are even better if one uses prod*(1-i/365) instead of prod*((365-i)/365). AxelBoldt 20:55, 19 Feb 2004 (UTC)


The theory behind it was described in the American Mathematical Monthly in 1938 in Zoe Emily Schnabel's The estimation of the total fish population of a lake, under the name of capture-recapture statistics.

I'm not sure what "the theory behind" the birthday paradox is, but I'm sure that people like Pascal, Fermat, maybe even Cardano knew the formula or could have derived it with ease. They would probably have considered questions like "you throw a fair die 3 times; how likely is it that you get three different numbers?" AxelBoldt 20:11, 19 Feb 2004 (UTC)


What is the intuition of most people in this case? I really don't know. It doesn't contradict my intuition. Andries 20:17, 16 May 2004 (UTC)

Most people find the probability of a match surprisingly high. That's why it was given the (not quite correct) name birthday paradox. -- Nunh-huh 20:24, 16 May 2004 (UTC)
I believe most people have two problems with this: underestimating the number of matches that don't include themselves, and underestimating the multiplicative growth of each additional person added to the total. I imagine the thinking is similar to this: 365/2 = ~180 people will give me 50% coverage of birthdates. They disregard the fact that 180 birthdates chosen randomly are likely to have many collisions. Clearly, for 50% probability, or 23 persons, 180 is incongruous.

As far as I can see, the series of inequalities on the bottom of the page is wrong. The wrong step is the inequality that goes form the sum to the integral. If you actually do compute the sum (which isn't that hard to do), you get ; if you evaluate the integral you get , but , not the other way around!

So perhaps the editors should either remove this part from the article or replace the < by ~ (approximately equal). (Or I could be wrong; it wouldn't be the first time.)

But if I'm actually right, then Paul Halmos has no right to be so smug about "tools that all students of mathematics have access to"...

Fun,

Sten

If one quotes from Halmos, one should quote exactly. If Halmos got the math wrong, a separate comment should say so. I think the reasoning can be fixed fairly easily, and I would not do it by saying "approximately equal"; I would use inequalities ("<" or "≤"). Unfortunately I will have to wait until tomorrow to look at Halmos' book. Michael Hardy 18:04, 7 Jul 2004 (UTC)

Notations for logarithms

I'm very surprised with the very last paragraph of the article. I have mathematical education, so I'm not a "non-mathematician". I don't know about logarithms in English tradition, but in Russian `ln' stands for natural logarithm only, `lg' stands for base 10 logarithm and `log' can be a general logarithm with arbitrary base (the base is specified as index; if it is missed then the base typically doesn't matter for instance in expressions like O(log n)).

I think it would be just to remove the reference to "non-mathemations" as offensive. I accept that `log' can mean natural logarithm in English tradition, but I think its fair to take into account that for some (mathematically educated) people it is exactly `ln' that stands for natural logarithm.

Paul Pogonyshev
I agree and have amended accordingly. Please feel free comment and/or revert. -- ALoan (Talk) 09:57, 16 Jul 2004 (UTC)
Since it's not part of the Halmos quote I won't be militant about it. If the Halmos quote had gone on a bit longer one would have seen him using "log" to mean natural log. Nowadays both "ln" and "log" are commonly understood by mathematicians to mean natural log. When Halmos wrote his autobiography, only about 20 years ago, he expressed contempt for the practice of many non-mathematicians of using "ln" rather than "log" for natural log, and said no mathematician had ever done that. By 1984, the date of publication, his claim was exaggerated. Nonetheless, it is still not unusual today to find mathematicians using "log" for natural log. Michael Hardy 19:59, 16 Jul 2004 (UTC)

Direct Solution

Does anyone know how to derive a direct solution to the birthday paradox(i.e. the way it is done on the article, the probability that it is not true is derived; then, that is subracted from one; I want to see a method to get the probability of a match without solving for the other one first). I know that the article's method is correct, but would be intrigued by the alternative solution because I attempted(and failed) to solve it that way. If someone knew it, I'd appreciate it if they posted it in the article. Superm401 04:58, 15 Jan 2005 (UTC)

This method of switching to the complement is really a pretty and quite powerful trick; it's next to impossible to solve the problem without it. You'd have to distinguish between and deal separately with numerous cases: exactly one matching pair, exactly two matching pairs, exactly three matching pairs, ..., exactly one matching triple, exactly two matching triples,..., one matching quadruple,..., exactly one matching pair and one matching triple, ... etc. (and each of those cases is harder than the single case you have to solve when you switch to the complement). In the end, you'd have to do superhuman algebra to simplify your huge sum down to our tiny little formula. AxelBoldt 03:50, 26 Apr 2005 (UTC)

Leap Years

The article states the for a number of 366 or more people, the probability is 100%. But what if someone was born in Feb 29th in a leap year? Should this need a correction? Plus, how should the probability for n=366 be affected because of the probability of 1 or more people be born on a leap year?

The article makes it clear that a year is assumed to have 365 days, plus other variables (distribution of birth dates through the year, incidence of twins and other multiple births, etc) are ignored. -- ALoan (Talk) 10:47, 17 Feb 2005 (UTC)
Perhaps a section on the complexities introduced by considering leap year would be informative. Actually in a school setting (where students tend to be born in a narrow range relative to the 4-year leap year cycle) you would need to take into acount the demographics of the population. While this would make the problem too messy to solve in any clean way, it would illustrate the fragile nature of a closed form solution. -- Jake 01:22, 18 October 2005 (UTC)

Proposal to exclude Paul Halmos from this article

The entire An (non-fatal) error in Halmos' argument (whose general idea is right) section (and if that explanation is needed, the entire section preceeding it with the argument) is really long-winded, almost seems like original research rather than something that really belongs in an article like this one. I'm very tempted to remove both sections or at least the latter...

I did remove the C program example, though. Completely superflous. Nice programming, perhaps, but just not needed. Daniel Quinlan 08:54, Feb 23, 2005 (UTC)

It would be unfortunate if the section on Halmos' argument were removed. The reason why that section is there is explained in that section; why don't you attempt to say what you disagree with in that? "Original research" would be things appearing for the first time here in this article, whereas Halmos' comments were published a couple of decades ago, and the article says so, so I find this "original research" comment absurd. Michael Hardy 00:10, 24 Feb 2005 (UTC)
Well, at present, that section says "one guy said this; he was slightly wrong; here is what he should have said" whereas it would be better, I think, to jump to the conclusion and present the mathematical (as opposed to numerical) view, and then explain that the analysis follows that published Halmos, but that he was slightly wrong. It seems rather unbalancing to have such as large section of the article dedicated to such a seemingly unimportant part of the topic, but perhaps I underestimate its importance? Could you explain why it is important? -- ALoan (Talk) 11:42, 24 Feb 2005 (UTC)
"A seemingly unimportant part of the topic"?? My initial reaction is that the whole topic is unimportant except because of that section.
But OK, let's look at it from the point of view of those who've never seen it before; such people do exist. The topic does have pedagogical value apart from the section that mentions Halmos. But one does not remain in that early stage of learning forever; the point of Halmos' view is that there's more to this problem than what you learned in childhood when you first saw it; there is such a thing as thinking mathematically, and as Halmos says, the tools involved are things that every student of mathematics should know. Michael Hardy 18:51, 24 Feb 2005 (UTC)

After writing the above I looked at the article with a view toward making that section shorter, along the following lines: I would give the fully correct argument, stating that it was an adaptation of the one from Halmos' autobiography, and quote only those parts of Halmos that say why he considers that point of view important. But the quoted paragraph of Halmos is such a beautiful example of how to write that that would not be easy. Please read Halmos' words CAREFULLY. If after that you don't appreciate it, I'll just diagnose you as a vulgar caveman. Michael Hardy 19:09, 24 Feb 2005 (UTC)

It is a nice quote, and it says some important things in a clear way, but it is less about the topic of this article (the birthday paradox) and more about mathematics in general. Or perhaps I am just a vulgar caveman... -- ALoan (Talk) 20:25, 24 Feb 2005 (UTC)

It is very directly on the topics of this article; it differs from the merely numerical computation of the probabilities by looking at the topic of this article from the point of view of "mathematics in general". Michael Hardy 20:57, 24 Feb 2005 (UTC)

Well, it's 2 against 1, so in the spirit of "be bold", I nuked the sections. I encourage someone to replace it with a more direct proof. Daniel Quinlan 07:40, Mar 2, 2005 (UTC)

OK, I hope you won't take it personally if I say this proves illiterates outnumber the rest of us — I'll fix the vandalism when I have time. Maybe I'll make a separate article titled an example of Paul Halmos's beautiful writing and link to it. Michael Hardy 23:01, 2 Mar 2005 (UTC)

Now I've restored that section. I hope I've made it dull enough so that the persons who wanted to delete it will find it at least tolerable. Michael Hardy 02:58, 3 Mar 2005 (UTC)

Thanks. I much prefer the dull version. -- ALoan (Talk) 10:32, 3 Mar 2005 (UTC)

I also find the Halmos quote, even in this "dull" version, vastly redundant, fairly obfuscated and downright showy. (I didn't even bother reading the original version.) What I don't like about it is:

1. It's too high in the article. If we really have to live with it, it should be made an internal or external link, or at the very least pushed at the bottom, possibly in microscopic size, as a historical/anecdotal note.

2. The title, and the whole section to be honest, is misleading at the very least. "Numerical" can never be opposed to "mathematical". I could accept opposing "numerical" to "analytical", but even so the section does present numerical calculations.

3. The section does not make it clear whether the error in the original argument is the one explained at the bottom of the section, or it has been silently corrected in the derivation.

4. As the last sentence points out, the section is not an alternative derivation of results perviously obtained, which by the way are not "numerical", but perfectly analytical. It's not about having a different view on this problem, and even less a different approach to its solution, it's about picking this problem to illustrate a perhaps interesting, but unrelated concept.

Wouldn't this section be more relevant in Halmos' page?

As an aside, people are entitled to disagree, and calling them illiterates, cavemen, and vandals because of that is not what I expect from an admin. And by the way, it's now three against one. --PizzaMargherita 20:31, 2 October 2005 (UTC)

Note, the above was posted by User:195.137.39.109 and then edited by User:PizzaMargherita. I'm a bit confused too. — Ambush Commander(Talk) 21:49, 2 October 2005 (UTC)
Yeah, that's still me. I got myself an account in the meantime. By the way, as a result of my recent streamlining of the article, this "Halmos" section has been pushed down because it was in the way. It was not my intention to act on this issue without civilly waiting for feedback on my comment above.

In response to PizzaMargherita's criticisms: I've changed the title of the section. However:

  • I think it's too low in the article, and if "it's three against one" I think that's only because the page as a whole has been neglected by most Wikipedians concerned with mathematics.
I think this attitude is so lame (especially of an administrator aspiring bureaucrat) that I'm considering reporting it. Nobody agrees with me, therefore they are not representative of the entire population (or, in your even less noble words, illiterate cavemen). What next, hiring sock puppets to support your argument? Get over it.--PizzaMargherita 23:49, 2 October 2005 (UTC)
  • There is no error explained at the bottom of the section. The article says nothing specific about what the error was. (As you can see for yourself.) In trying to guess what you think is an error explained at the bottom of the section, I can only speculate that you read the last paragraph: Halmos' derivation shows only that at most 23 people are needed to ensure a birthday match with even chance; since we don't know how sharp the given inequalities are, the argument leaves open the possibility that n = 22 could also work. It would not have occurred to me that anyone might think that this explains any error, unless it was because of the word "However". I've deleted that word.
  • No, it would not make more sense in the page on Paul Halmos. It's obviously not among his most important writings, and the article about him is short as it stands. When more is added, this little thing should be a long way down the queue. However, don't you think this article is better if the reader finds out something beyond what he or she learned in high school -- that this isn't necessarily only a secondary-school-level topic? Why limit the whole article to things that any freshman who thinks the problem through would figure out?

Michael Hardy 22:33, 2 October 2005 (UTC)

Because Wikipedia should be complete, but also minimal. I'm sure Paul Halmos has expressend his view on many other things, not only on how the solution of the birthday problem should be taught to students (dear me, I can't believe I just wrote that). He may also have voiced concerns about Disk algebra and even the Archbishopric of Salzburg, but you don't go around the Wikipedia littering those articles with his view, do you?
Conversely, I'm pretty sure that he was not the only eminent person to have said something vaguely related to this problem. But I can't see anybody else adding those people's 2 cents to this page, do you?
As you point out, not even Paul Halmo's page would welcome this section, so I stand corrected. I'm afraid its home, if anywhere on the web, is in your private website, where you are more than welcome to have an entire page bragging about how you (think you have) discovered a mistake in Paul Halmos' reasoning.--PizzaMargherita 23:49, 2 October 2005 (UTC)
I agree that the quotation from Halmos is a nice piece of writing, and I strongly sympathise with his point of view, but I don't think it belongs in this article in its entirety. Definitely the mathematical argument should be retained, with appropriate attribution, and in fact I think it should be moved closer to the top of the article. The only part of the quotation which is really relevant is "the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer", and I think something along these lines is definitely worth mentioning. Perhaps "The significance of the following argument is that it enables one to estimate the cut-off point without needing to perform a whole series of multiplications" or something similar instead. Dmharvey File:User dmharvey sig.png Talk 00:25, 3 October 2005 (UTC)
I am strongly against moving this section anywhere other than further down or in the bin, as it would break (and confuse) the logical and linear (at least to me) exposition as it stands up to that point.--PizzaMargherita 01:49, 3 October 2005 (UTC)

I am not the one who discovered the error. You can see who it was if you read this discussion page.

Your reasoning makes no sense:

He may also have voiced concerns about Disk algebra and even the Archbishopric of Salzburg, but you don't go around the Wikipedia littering those articles with his view, do you?

That is ridiculous!! No, one should not put every eminent person's opinion on every subject into every article. But that is irrelevant here. Halmos' idea was (obviously!) not included simply because he's an eminent person who said something about this. It's included because it's relevant; it's something the reader could be expected to be glad to know, NOT about Halmos, but about the birthday paradox. As I said, it's the only part of the article that takes the math beyond the point where any secondary-school student who thinks it through would take it. One reads an article like this in order to learn more than one would figure out for oneself. Michael Hardy 00:54, 3 October 2005 (UTC)

Sorry for missing who spotted the error (it was in another section of this page). However, all my other arguments still stand, and I still respectfully disagree.--PizzaMargherita 01:49, 3 October 2005 (UTC)

... oh, and how is it that you find this part "obfuscated"?? Halmos always writes with beautiful clarity. And this is a good example of that. Michael Hardy 01:21, 3 October 2005 (UTC)

Unfortunately you don't seem to follow the example.--PizzaMargherita 01:49, 3 October 2005 (UTC)

Here's something I didn't even notice the first time through your writing: "it's about picking this problem to illustrate a perhaps interesting, but unrelated concept." "Unrelated"?? That's absurd!!! This is obviously not about picking the birthday paradox to illustrate something unrelated; this is obviously about how some standard topics from the first couple of years of undergraduate mathematics can illuminate a topic that might otherwise appear to be only about concrete numbers. Michael Hardy 01:25, 3 October 2005 (UTC)

No matter how many exclamation marks you use, how many times you say "obviously", and how many times you say that comments that disagree with you are "absurd", this section remains in my (and others') opinion largely unrelated. Certainly much less related than gazillions of things that could be said about this problem/paradox.--PizzaMargherita 01:49, 3 October 2005 (UTC)

So are you claiming that anything that treats the problem by any means other than secondary-school-level mathematics is "unrelated"? Michael Hardy 02:03, 3 October 2005 (UTC)

Actually, not necessarily "unrelated", but more like "inappropriate". The mathematics sections of Wikipedia have this problem: how formal do you get before you've gone to far and made the article nonencyclopedic? For instance, the first incarnation of Database normalization was a real pickle to read: all of it was formal definitions of each normal form. As the article progressed, these formal definitions where removed, and more "layman" term explanations where replaced. This article doesn't seem to have this problem: it appears to be clear enough to the ordinary person, but it comes down to the question is the formal proof too burdensome for the purposes of an encyclopedia? — Ambush Commander(Talk) 02:17, 3 October 2005 (UTC)
"So are you claiming that anything that treats the problem by any means other than secondary-school-level mathematics is " unrelated"? " No. I have absolutely no problems with the level of mathematics in the section, which frankly IMO is not much different from the level of the rest of the article. The problem is that this section is about pedagogy of mathematics. It does pick this problem (like it could have picked another one) to illustrate the point, but it adds virtually no value to this article, and is therefore... how shall I put it... inappropriate (thanks Ambush Commander).--PizzaMargherita 06:50, 3 October 2005 (UTC)

Most Wikipedia articles on mathematics, on which hundreds of mathematicians have worked, are far less "layman"-oriented than any part of this. Would you delete all of those? This article, including the section on Halmnos' observations, is limited to lower-division undergraduate material. So if it goes at all beyond secondary-school level, it should not be here -- it should just get deleted? This should be an encyclopedia of high-school math? Lower-division undergraduate math is insufficiently "layman"-oriented for inclusion in an encyclopedia? Michael Hardy 03:03, 3 October 2005 (UTC)

I recommend removing the Halmos quote from that section. It appears to me to be simply espousing his POV beliefs about how mathematics should be done (even though I tend to agree with him), rather than adding anything immediately relevant to this article. In this context it does seem "showy", and would better be placed in an article about Halmos himself rather than here. - Gauge 03:17, 3 October 2005 (UTC)

I'm glad I write about topics that most people don't understand. Less controversy that way. ;-) FWIW, Michael Hardy, Dm Harvey and Gauge are some of the senior mathematicians on WP, so I take their opinions seriously. And Paul Halmos was famous, and so surely his ideas count for something too. linas 04:20, 3 October 2005 (UTC)

You are asked for your opinion here. Do you have one?--PizzaMargherita 06:50, 3 October 2005 (UTC)

Here are some of my thoughts:

  1. I like the section, and I think it most definitely should stay in the article. It seems important, relevant and appropriate.
  2. I like the new title "A conceptual rather than computational view" much better than the old one. So thanks to Pizza and Michael for working together successfully on that.
  3. I have no problem with the Halmos quote. I find it to be a relevant quote explaining the importance of this argument in helping to provide a fuller understanding of the birthday problem.

Paul August 04:23, 3 October 2005 (UTC)

So you find it relevant/appropriate/related/important. Could you please explain why it is more so than the "Applications" section (for instance), thus deserving a higher position in the article?--PizzaMargherita 06:50, 3 October 2005 (UTC)
Well I can't explain why this section is more important than the applications section, since I haven't formed an opinion on that. However I don't think there is necessarily a 1-1 correspondence between position and importance. Paul August 16:08, 3 October 2005 (UTC)

May I suggest that we are actually debating two separate issues here. We might all be better served by splitting the conversation into two pieces as follows.

On the merits or otherwise of including Halmos's mathematical argument

I've already stated that I think Halmos's argument should stay. Wikipedia can be both a general-audience encyclopaedia, and an encyclopaedia aimed at people with more background (in this case mathematical), as long as we are careful how we organise the material. The relationship between Derivative and Derivative (generalizations) is an excellent example. We would be doing many of our readers a grave disservice by leaving out Halmos's discussion. Dmharvey File:User dmharvey sig.png Talk 13:21, 3 October 2005 (UTC)

I strongly agree with what Dmharvey has said here. Paul August 16:08, 3 October 2005 (UTC)
Agree that the mathematical argument should stay, in one form or another. - Gauge 03:44, 6 October 2005 (UTC)
(Well done for identifying the two separate issues.) I think that "We would be doing many of our readers a grave disservice by leaving out Halmos's discussion" is an overstatement. Ok, I don't rule out completely that there may be a little value in the mathematical argument, but it needs a lot of work for this value to be brought out. In that sense, the title and the opening sentence don't make it clear at all what the section is all about, and (sorry to repeat myself) why it's relevant there. Why would I read it? Why would I skip it? It all sounds insipid and bombastic at the same time. I will try to come up with a more "to-the-point" version that preserves the math.--PizzaMargherita 17:41, 3 October 2005 (UTC)
I have mixed feelings about the value of including the argument in this article. I agree with Halmos that it is an opportunity to see some interesting and valuable mathematical techniques. For example, in computational complexity theory such methods are vital; likewise in analysis. My reservations concern whether this is an appropriate place to be doing that teaching. I lean in favor of retaining the material, because I don't think it will hurt the article for a general audience, and it may help some.
That said, the exposition was a mess. Especially troubling was the mix of equalities and inequalities, and the lack of almost any explanation of the substitutions. I have rewritten the section — at the cost of more length — to try to explain what we're doing, why it's worthwhile, and to do it more cleanly. I hope the mathematicians like it, and am especially interested in PizzaMargherita's response to the rewrite. --KSmrqT 21:09, 3 October 2005 (UTC)
"The exposition was a mess"? Your re-written version is written as if you're trying to explain some really basic parts of advanced secondary-school level math to students at that level, and even if that's appropriate, it hardly means that it's a "mess" if it's presented in a manner intended to be convenient for people who know math well. As it was written, it wasn't so far from the way Halmos wrote it, and Halmos has a well-deserved reputation as a clear expositor. A "mix of equalities and inequalities", such as
etc., is a pretty standard way of writing things, because it's efficient and avoids undue wordiness. (The "substitutions" were explained, albeit only after the sequence of equalities and inequalities.) Michael Hardy 23:27, 3 October 2005 (UTC)
(Indented Hardy response.) Perhaps you'd rather wait and let PizzaMargherita rewrite the "insipid and bombastic" version, or delete it? ;-)
But let me address one subtlety, because I think it's important. If we mix equalities and inequalities in multiline form,
A = B
< C ,
= D ,
a quick skim suggests that A = D, especially with many lines and lengthy expressions, so that a reader may look at the first and last lines for a summary. Such a misunderstanding is less likely in the case of brief one-line form,
A = B < C = D.
What we saw was a mix of equalities and inequalities, and also a mix of one-line and multiline forms. It was not a considerate exposition even for readers who know mathematics well. Could we slog our way through it? Yes, but that doesn't make it good writing. IMHO. --KSmrqT 01:07, 4 October 2005 (UTC)
Ah, the joys of original sources, or nearly so. Digging back through the history, I find that Michael Hardy introduced the section purely as a longer quotation from Halmos, but gave it a heading more inflammatory than the text. Later, Halmos' error was pointed out, and various attempts were made to correct it. Except for the error I like Halmos' version even more than mine, and prefer either one over most of the intervening "fixes", which are not so appealing. --KSmrqT 06:31, 4 October 2005 (UTC)
I don't have time to read the review (or should I call it "preemptive strike"? ;-)) carefully at the moment. Just a few comments.
  1. The explaination is certainly clearer now. Though at times, I think it's excessive (e.g. (e^a)^b = e^(ab)). Some of you got the idea that I find the section too advanced. It's not that, the maths is not difficult. It was just badly exposed.
  2. There is considerable overlapping with the previous sections. In particular:
    • "Calculating the probability" would do with a more explicit mention of the fact that p and p are probabilities of two complementary events, and this should not be repeated in Halmos' section. The same symbols are used, therefore what they refer to is understood.
    • "Calculating the probability" would also do with the "product" formula of the probability p, (i.e. a non-expanded version), which should be referred to in Halmos' section.
    • The sections above are generic in d, why should this section not? "Because Halmos..." I don't care what Halmos originally wrote. This is the WP, not his book, and we are writing it. Once we say that this section is adapted from such and such source (and this is all we should write about it IMO), we can write all we want. And by the way, for the same reason, we should use only one form of "log/ln" across the article, no, across the WP. By the way, which convention does WP adopt, if any?
  3. I'm starting to think that the same result can be readily derived by the Taylor expansion approximation of p in the "Calculating the probability section" above. In other words, the fact that the e^-(n^2-n)/2d is an upper bound of p and not only an approximation, could possibly be derived directly by looking at Taylor's series.
--PizzaMargherita 07:04, 4 October 2005 (UTC)
Ok, I modified the title and intro to something more to-the-point. Given the feedback on Halmos and the pedagogic aspect, I confined it to the footnote. Comments welcome.--PizzaMargherita 07:04, 4 October 2005 (UTC)
Points 2a, 2b and 2c are done (since I didn't hear any objections). As for point 3, it actually applies. I don't know how it escaped me for so long, but the whole argument can be explained by the 1 − x < e^{−x} inequality, given the explainations of the approximation that we have already given in the first section. I find this approach to the same result (the upper bound) more direct and linear than what we have atm. I appreciate that switching to this simpler argument would get rid of most of Halmos' stuff, so please voice your opinion.
To be clearer, my proposal is:
That's all it's needed.--PizzaMargherita 06:59, 6 October 2005 (UTC)
Using generic d seems unnecessary clutter here, so I do object to that. I strongly object to the latest proposal, and also to a prior edit removing introductory sentences, both for the same reason. Namely, the point of this section is not merely to derive a bound (we've already done that).
No, we haven't.--PizzaMargherita 09:55, 6 October 2005 (UTC)
If that were the game, we could kill the entire article except for a sentence that says "and the answer is: 23". What is the point being made by Paul Halmos, Michael Hardy, Dmharvey, Paul August, Gauge, and me (and others, I think)? Let's try an experiment: Please state your understanding of why we want this section. (Hint: It was explicitly stated in those sentences you removed.) --KSmrqT 08:24, 6 October 2005 (UTC)
Please do me the courtesy of leaving my talk page words intact, as they were written, with signature. Thanks.
I just moved the sentence in a new section.--PizzaMargherita 08:53, 8 October 2005 (UTC)
Now, what part of "I strongly object" was not clear?! Why did you ask for opinions on your proposal if you were going to ignore them? Your edit flies directly in the face of my insistence that the point is not merely to derive a bound. (Whether we've done it or not is irrelevant.) I'm quite serious when I ask you to state your understanding, because your behavior indicates that we are not on common ground. --KSmrqT 02:11, 8 October 2005 (UTC)
I have stated my understanding in a concise way in the version that you have removed, as well as in this talk page. I ignored your objection (the only one) because it made it clear that you had completely missed the central mathematical argument - which you may find irrelevant (your missing the argument, not the argument itself), but I do not. So you're right, we were not on common ground, but you seem to have caught up. (Out of interest, was it thanks to my explainations?) I believe that the consensus is to drop the pedagogic part and concentrate on the mathematical contents. So why using an "important inequality" like the one about the arithmetic and geometric means, when it's clearly unnecessary?
A list of things that I don't like about the section as it stands:
  1. The title "Implications of inequalities" is too vague. The expression "upprer bound" should appear in it.
  2. The opening "For variations of the birthday scenario in broader contexts, a different flavor of argument is essential." is even more vague, and IMHO redundant, as it doesn't bring in any value.
  3. "The general idea is..." to find an upper bound. "If our final expression..." Our writing should be concise.
  4. As I said, the arithmetic/geometric inequality can go without affecting the mathematical argument
  5. If we have to have 365 instead of d, then we should keep "2x365", as opposed to "730". We don't like numerical stuff, remember?
  6. Several explainations are so verbose they are distracting, and so detailed they are patronising. Why don't we explain the meaning of "+" in every article?
  7. Use of "log" vs "ln" is inconsistent with the rest of the article. Or vice versa.
--PizzaMargherita 08:53, 8 October 2005 (UTC)
Concensus is not voting, but coming to a common understanding. Ignoring my objection is not concensus. And if we were voting, we would not count your vote as many times as you have stated your opinion; I count about half-a-dozen voices disagreeing with you throughout this discussion, often vehemently. Most voices support pedagogy; it's yours that persistently opposes it. We love mathematics, we teach it, we understand the importance of the inequalities and the flavor of argument using them. Yet for that view you show only contempt. You did not respect Michael Hardy when he said it, you did not respect several other mathematicians when they said it, and you ignore me when I say "I strongly object". And by the way, I suggest you read what's been said more carefully, because I was not asking about your understanding of the derivation of the bound; I asked "Please state your understanding of why we want this section", which is another matter altogether. The correct answer is, we want it for the pedagogy, which is exactly the opposite of your view of the concensus. Apparently you do not respect our view, so you ignore it and substitute your own. That's not how we like to do things at Wikipedia, nor for that matter in most of the real world.
If you are willing to accept pedagogy, rather than merely deriving a bound, as a goal of the section, then we can discuss refinements; otherwise, we are wasting our time talking at cross purposes. I suggest you accept, because, frankly, the only real justification for the entire article is pedagogy. --KSmrqT 05:27, 9 October 2005 (UTC)
"Ignoring my objection is not concensus." You are right, it's not. And it's not even "consensus", for that matter. It's assuming that people who have actually understood what we have decided to write about (the mathematical argument, see the title of this subsection and the quotations below), and are therefore entitled to write about it, did not have any problems with my proposal.
"I count about half-a-dozen voices disagreeing with you throughout this discussion, often vehemently." See, if they don't type I have trouble seeing and hearing them from my workstation.
"Most voices support pedagogy; it's yours that persistently opposes it." Intresting point of view. Who said the following? (Hint: not me.)
  • "I don't see any reason that this particular article should discuss pedagogy in this level of generality where most of our mathematics articles do not."
  • "I agree that Halmos is talking about pedagogy, and to that extent it is out of place, here."
  • "[the whole Halmos section] seems like original research rather than something that really belongs in an article like this one."
  • "it would be better, I think, to jump to the conclusion and present the mathematical (as opposed to numerical) view"
  • "I encourage someone to replace [the section] with a more direct proof"
  • (my favourite) "My reservations concern whether this is an appropriate place to be doing that teaching."
Rest assured that when most poeple disagree with me, I am (unlike other people) capable of recognising it and accepting it (see generalisation section below).
"You did not respect Michael Hardy when he said it". Uhm, perhaps because he showed no respect to who was opposing his view? I refer you to the beginning of this talk section.
"If you are willing to accept pedagogy, rather than merely deriving a bound, as a goal of the section, then we can discuss refinements". Most refinements (my 7 point above) can be discussed even without an agreement in this sense.
--PizzaMargherita 09:47, 9 October 2005 (UTC)
8. Another thing that I think should be mentioned ("essential" for a broader-flavoured understanding with different blah-blah-contexts in variations of arguments, or something like that) is that the formula we end up with is the same as the approximation in the previous section, which therefore is not only an approximation, but also (guess what?) an upper bound.--PizzaMargherita 06:17, 10 October 2005 (UTC)
Would it be fair to say that nobody objected to the fact the arithmetic/geometric inequality is not needed, and the same result can be achieved without it? If so, I shall proceed and take it out.--PizzaMargherita 11:41, 24 October 2005 (UTC)
Hoping to meet everyone's tastes, I'll change the title of the section to "An upper bound and a different perspective" and will remove the introductory sentence. PizzaMargherita 23:50, 1 November 2005 (UTC)
If nobody objects, I'll effect the changes proposed in the points 3, 5, 6, 7, 8 above. The text is now out of sync with the derivation. PizzaMargherita 12:32, 10 November 2005 (UTC)

On the merits or otherwise of including Halmos's quotation

I've already stated that I don't think Halmos's quotation should remain in full. I don't see any reason that this particular article should discuss pedagogy in this level of generality where most of our mathematics articles do not. Dmharvey File:User dmharvey sig.png Talk 13:21, 3 October 2005 (UTC)

I think that Halmos' quote is doing two things at once. I agree that Halmos is talking about pedagogy, and to that extent it is out of place, here. But at the same time he is also explaining how his argument helps to provide a fuller understanding of the birthday problem, and I think this latter point is wholly relevant here. I don't really see how we can get rid of the former and keep the latter. Perhaps It might be better to remove this quote to a note at the bottom. I will be bold and do this, and see what folks think. Feel free to revert if anybody doesn't like it. Paul August 16:08, 3 October 2005 (UTC)
As far as I'm concerned, the pedagogic part should go. The aspects that are not irrelevant are inappropriate. If we want to attribute the mathematical argument above to Halmos - fine, but I wouldn't spend more than one sentence (or perhaps just a mention of the name) on that. As I said, I'll give it a go. Putting a footnote may be the right direction, thanks Paul.--PizzaMargherita 17:52, 3 October 2005 (UTC)
The footnote seems to be a good compromise if some people insist on keeping the quotation. Personally I am not bothered by how the article reads now. - Gauge 03:48, 6 October 2005 (UTC)

Generic d-formulae vs. specific 365-formulae

Using generic d seems unnecessary clutter here, so I do object to that.

Let's see: "d" = 1 character, "365" = 3 characters. Where's the clutter again?--PizzaMargherita 09:55, 6 October 2005 (UTC)
Wait, I may understand what you mean: you don't like p(n;d), you'd rather see p(n). I agree on that part, it's awkward to carry it around, and it's evident from the formula that d is a parameter. I propose we define it as p(n;d) the first time around and we explicitly say that in what follows it's understood that by p(n) we mean p(n;d). This is notwithstanding the proposal below, to give the 365-specific solution at the very top of the article.--PizzaMargherita 09:55, 6 October 2005 (UTC)

I strongly object to having d instead of 365 in the introduction. The introduction should be completely accessible to anyone who can multiply fractions. The best thing about this problem is its accessibility; changing it to d loses some of that. (It also should not use product notation.) Dmharvey File:User dmharvey sig.png Talk 12:02, 6 October 2005 (UTC)

The introduction has been that way for some time now... you mean the introduction of the article, right? In that case, I can see where you are coming from, as I agree that we shouldn't scare people away. We could give the first formula for 365 and right after state the more general result and carry on with d. What do you think? I'm sure those interested won't have a lot of trouble substituting d with 365... not any more than understanding Taylor series truncations or the probability of complementary events. And the importance and applications of this problem have nothing to do with birthdays. That's why I think that general d-results will be more useful to people who land here.--PizzaMargherita 12:24, 6 October 2005 (UTC)

Any competent mathematician can generalize the argument from 365 to "d" easily enough themselves. The point is that this article is about the birthday paradox, meaning that the focus should be on 365. If PizzaMargherita wants to write another article on generalizations of this approach and their applications, that is fine with me, but changing 365 to "d" here just adds another hurdle for the interested reader. - Gauge 03:21, 8 October 2005 (UTC)

Indeed! Paul August 03:54, 8 October 2005 (UTC)

Fine. So if you like the proposal above I'll do that when I have some time.--PizzaMargherita 09:03, 8 October 2005 (UTC)

Though I disagree with Gauge when he says that "changing 365 to d here just adds another hurdle for the interested reader". The reader may well come from Birthday attack or Hash tables, and she would want to find the generic formula right there. I don't find the "hurdle" of substituting incredibly difficult. I'll do (or rather, restore) a "generalisation" section.--PizzaMargherita 09:03, 8 October 2005 (UTC)

Near mathches

Is the table in Near matches really correct? It doesn't match with my calculated probabilities for near matches, nor my empirical tests. Also, unless I'm doing something seriously wrong, one is not required to use the inclusion/exclusion principle to calculate this.

For k=5, 8 ppl are required (59.3158%), k=6, 7 ppl (55.0188%), k=7, 7 ppl (60.6174%). --Yarin 20:43, 6 August 2005 (UTC)

Could somebody give the details of these calculations to evaluate? --Neshatian 12:12, 9 May 2006 (UTC)

Editing proposal

I propose moving the computer code to a separate article that would serve as an appendix, to which this article would link. That would help keep this article readable. Michael Hardy 02:58, 3 Mar 2005 (UTC)

A good idea - please would you resurrect the deleted versions from the edit history when you do. -- ALoan (Talk) 10:35, 3 Mar 2005 (UTC)
I'd prefer to leave the current programs in the article, I think they're more useful to many readers than the math discussion. Daniel Quinlan 09:47, Mar 5, 2005 (UTC)
I disagree, a pseudocode rendition would be the most useful while a version in each of 5 languages is just a mess. I'm removing all but one. --Gmaxwell 17:31, 6 Jun 2005 (UTC)
Well, leaving the python version was obviously going to cause trouble :-) I had a stab at writing a Wikipedia:wikicode equivalent. Apart from not knowing how to correct 'print', it seems correct. Richard W.M. Jones 22:25, 6 Jun 2005 (UTC)
Should the various code versions not be moved to Wikisource - they seem to have adopted that solution at Monty Hall problem. IIRC there are half a dozen versions in various languages in the edit history.-- ALoan (Talk) 30 June 2005 21:47 (UTC)

blog link

http://inclinedtocriticize.blogdrive.com/archive/240.html

Why is that there? It's neither particularly enlightening nor detailed.

Good now

I like the current article quite a bit, I'd prefer to leave the current computer examples, I think they are at least as accessible by the average reader as the mathematics, if not more accessible. So, nice job, Michael. That being said, I'm not impressed by your repeated insults, though. Not cool. Daniel Quinlan 09:47, Mar 5, 2005 (UTC)

Why 100%

The main page assumes no leap years. Therefore, once all 365 days have been taken, the next person has to be on one of the days already used. Hence, 100%. Superm401 | Talk 01:47, Jun 4, 2005 (UTC)

Merge

I would do the merge, provided there is no copyvio. Superm401 | Talk July 1, 2005 12:13 (UTC)

Mistake in the first equation?

My maths may be poor, but the first equation to calculate the probability p that the n birthdays are different seems wrong. The equation given is:

However according to the BIDMAS rules the addition for the last fraction will happen before the subtraction, so the equivalent equation is:

And I think it should be:

Any comments?

That's not correct. Under standard order of operations, subtraction and addition are at the same level. That means the order of evaluation is left to right. Therefore, the equation in the article and your bottom equation are equivalent. Superm401 | Talk 11:57, July 15, 2005 (UTC)

365 − n + 1 means (365 − n) + 1, not 365 − (n + 1). That is universally standard. What in the world is "BIDMAS"?? A programming language? Or is it one of those mnemonics by which children learn math conventions? Michael Hardy 22:48, 15 July 2005 (UTC)

Hmm, something interesting, why isn't it: :? — Ambush Commander(Talk) 23:48, July 15, 2005 (UTC)
The format in the article is more consistent with the actual logic behind the equation. There's no reason to change it.Superm401 | Talk 01:25, July 16, 2005 (UTC)
The article still needs to be changed to remove the ambiguity. The average reader cannot possibly be expected to work out the order of evaluation from a convention that they know nothing about, regardless of whether it is technically correct or not.   Lee J Haywood 06:56, 16 July 2005 (UTC)
I've added parentheses, does that resolve the ambiguity? — Ambush Commander(Talk) 13:15, July 16, 2005 (UTC)
If adding parentheses, it should be , not because the latter is not following the logic. And I seriously cannot imagine how a person (even that average reader) cannot know about order of evaluation of plus and minus operations, sorry. --Paul Pogonyshev 17:14, 16 July 2005 (UTC)
Michael, BIDMAS = Brackets, Indices, Division, Multiplication, Addition, Subtraction. That's how people are often taught to evaluate expressions (maybe it's restricted to the UK?), the mistake being that division and multiplication evaluate left to right together, followed by addition and subtraction evaluate left to right together.
I learned the same rules, just with the name of PEMDAS, for Parentheses, Exponents, Multiplication, Division, Addition, Subtraction. Superm401 | Talk 17:19, July 24, 2005 (UTC)
I've taugh math at five different universities, including one where nearly all students were academically weak and also including MIT, the others being somewhere in between, and I've done quite a lot of private tutoring of both academically weak and fairly gifted students, and I've never hear of BIDMAS or PEMDAS. I have heard that some people use mnemonics in attempting to learn mathematics, but I've always ignored that. One confusion I fear might happen as a result of these rule is that people might think that means , whereas in fact Michael Hardy 18:17, 24 July 2005 (UTC)
First of all, just because someone originally learned with a mnemonic device doesn't mean they will never understand the fundamental concepts. PEMDAS was useful in introducing me to order of operations. I also disagree that it will cause the error described above. On the contrary, PEMDAS is perfectly correct even when taken literally with no thought. We first notice there are no parentheses. Then, we move on to exponents. The exponent of the first power moving left to right, is b^c. Hence, we evaluate that first, getting (b^c). We then see(moving left to right) another power, a^(b^c). We evaluate that, getting the right answer. Superm401 | Talk 04:13, July 25, 2005 (UTC)

Well, on my browser, the a appears (below and) to the left of the b, and the b appears (below and) to the left of the c. And that's how I've always seen it written. So I really don't understand your argument about that. Michael Hardy 20:47, 25 July 2005 (UTC)

Of course it is. Just as 3 is the base of 3^5 because is below and to the left, a is the base of a^b^c because it is below and to the left. The base is evaluated after the exponent. That's what I'm trying to say. Superm401 | Talk 21:25, July 25, 2005 (UTC)
Your rule is right if the base is evaluated after the exponent, but you said left-to-right, and left-to-right appears to suggest evaluting the base first. Michael Hardy 23:34, 3 October 2005 (UTC)

Merge

The link in the merge template on the page refers here, but I don't think anyone has actually started to talk about it yet. I don't think that page should be merged. I'm doubtful about whether it should even be kept. It uses obscure abbreviations, doesn't explain itself, and the table may be a copyvio. Furthermore, I don't think the idea of a birthday distribution is nearly as common as the "paradox" itself. I'd like to remove the merge tag. What do others think? Superm401 | Talk 17:26, July 16, 2005 (UTC)

It looks like a copyvio of http://www.mathcad.com/library/LibraryContent/puzzles/soln28/exact28.html --Audiovideo 14:19, 19 July 2005 (UTC)

Empirical test

Empirical test in article is not a simulation. It is just some ready formula. Here is real test (in c#):

Random rnd=new Random();
int total_pairs=0;
int tries=1000;
for (int t=0;t<tries;t++)
{
	//pick random birthday for every person
	ArrayList persons=new ArrayList();
	for (int i=0;i<23;i++)
		persons.Add(rnd.Next(365));
	int same_birthdays=0;
	//check for birthday pairs
	for (int a=0;a<23;a++)
		for (int b=0;b<23;b++)
		{
			if ((int)persons[a]==(int)persons[b] && a!=b)
				same_birthdays++;
		}
	if (same_birthdays>0)
		total_pairs++;
}
Console.WriteLine("chance of same birthday pair for 23 persons: "+(double)total_pairs/tries*100+"%");

Exe 22:05, 31 July 2005 (UTC)

Your edit has been reverted by User:Richard W.M. Jones. Although he described the edit as User:Exe managed to slip an obvious POV change to using C# - reverted to using wikicode, he failed to drop a notice on the talk page. Perhaps we can compromise and change the title of the section to something else? — Ambush Commander(Talk) 00:30, August 5, 2005 (UTC)

Reverse problem

I have a couple of problems with the "Reverse Problem" section, but I'm not too sure how to fix them.

  1. I think there is a little notation inconsistency, in that p and n are both functions and variables. I don't know how to put it, but I have the feeling that in the same sentence p is both a given point and a function.
  2. Two problems are stated, but one solution is given. I think it would be nicer to state only one problem, and perhaps mention that it's a quantile kind of problem. By the way, in my [Mood, Graybill, Boes] the quantile is defined in a completely different way than in the WP article.
The q-th quantile of a random variable X or of the corresponding distribution is defined as the smallest number ξ that satisfies FX(ξ) <= q.
I guess this is a comment for that other page...--PizzaMargherita 20:27, 3 October 2005 (UTC)

links removed

I just removed two links from the article:

feel free to flame me, if you think that was a bad idea --J.N. 15:48, 24 October 2005 (UTC)

Klamkin (1967)

The birthday problem for such non-constant birthday probabilities was tackled in [Klamkin 1967]. What are the results presented in this paper? In particular, it is reasonable that for non-constant birthday probabilities, the proability of two birthdays on the same date is higher than in the case of constant probabilities. Is this result proved in that paper? Does someone know a reference where this proof can be found? --NeoUrfahraner 08:30, 7 November 2005 (UTC)

I'm pretty certain that you are right: non-constant birthday probabilities lead to strictly higher collision probabilities (except for a couple of trivial exceptional cases, where the collision probabilities are the same). I haven't seen Klamkin's paper, but it seems likely that he proves this result there. AxelBoldt 16:00, 8 November 2005 (UTC)
I found a proof in D. Blom, AMM, 1973. See references. --NeoUrfahraner

birthday distribution

"it becomes relevant that due to the way hospitals work, more children are born on Mondays and Tuesdays than on weekends."

Er, really? Do hospitals suppress labor if it occurs on the weekend? -VJ 08:11, 5 January 2006 (UTC)

I took this to mean that labor is more often induced near the beginning of the week, for some reason. --RCS talk 05:40, 18 January 2006 (UTC)

It does look dubious. Can anybody back that with some reference? I would feel more comfortable if this made its way to the childbirth article first. PizzaMargherita 07:04, 18 January 2006 (UTC)

Here are some German links: http://www.welt.de/data/2005/09/30/782511.html http://www.kinderwelten.de/cgi-bin/websql?sqlid=18349&nid=1019 They say the reason is e.g. that Caesarean sections become more popular. --NeoUrfahraner 10:24, 19 January 2006 (UTC)
Here is a better one in English with data from Enland and Wales: http://www.statistics.gov.uk/downloads/theme_health/HSQ9book_V1.pdf There is an obvious seven day cycle, with fewer births on Sundays compared with births on other days of the week. The average number of births on a Sunday during 1979 was 1,373, with a standard deviation of 60, whereas the overall daily average in 1979 was 1,748, with a standard deviation of 211. (Page 7). --NeoUrfahraner 11:24, 19 January 2006 (UTC)

Thanks :) PizzaMargherita 23:41, 21 January 2006 (UTC)

Suggestion for another table

This article could address another very common curiosity: the expected range of birthday matches for certain group sizes. To phrase it another way: present a table showing group size X and the expected collision range N to M, where the odds are even that at least N and at most M birthdates will recur among those X people. Alternately, a link to a "birthday paradox calculator" applet including this feature (among others) would be similarly useful (this is different from the current link to a Mac-OS birthday grapher application). -- Mike 17:15, 11 January 2006 (UTC)

Increment numPeople before or after

A while back User:205.170.235.246 changed the empirical test so that the increment would be performed after all operations, rather than before. I reverted the edit, on the grounds that: revert with reluctance: I that the increment operator is supposed to before the logic. It's a bit weird, I agree. In this edit, the anon has done it again.

So, I hacked out a quick test using PHP this way:

// append ?alt to URL to use anon's method
$days = 365;
$numPeople = 1;
$prob = 0.0;
while ($prob < .5) {
    if (!isset($_GET['alt'])) $numPeople++;
    $prob = 1 - (1-$prob) * (1-($numPeople-1) / $days);
    echo "Number of people: $numPeople<br />";
    echo "Prob. of same birthday: $prob<br /><hr /><br />";
    if (isset($_GET['alt'])) $numPeople++;
}

and I determined that whether or not the increment was present didn't make a difference in the final result. However, when it was at the back (anon's method), a probability when there was 1 person was given (obviously 0).

The reason, however, why this is weird, is because in most cases, the increment is indeed performed after the loop, as this for statement illustrates:

$days = 365;
$prob = 0.0;
for ($numPeople = 1; $prob < .5; $numPeople++) {
    $prob = 1 - (1-$prob) * (1-($numPeople-1) / $days);
    echo "Number of people: $numPeople<br />";
    echo "Prob. of same birthday: $prob<br /><hr /><br />";
}

We can also alleviate the aforementioned concerns by beginning the loop with $numPeople = 2, starting with the first "logical" case.

In the end, I am more for a solution that uses for rather than while. Any comments? — Ambush Commander(Talk) 22:19, 27 January 2006 (UTC)

I'm with you. I would change it to a for. Superm401 - Talk 01:44, 28 January 2006 (UTC)
Changed. — Ambush Commander(Talk) 20:37, 28 January 2006 (UTC)

Make it really empirical

int tries=1000;

srand((unsigned)time(0));
int total_pairs=0;
for (int t=0;t<tries;t++)
{
        //pick random birthday for every person
	  std::vector<int> persons;
        for (int i=0;i<23;i++)
                persons.push_back(365*rand()/(RAND_MAX + 1.0));
        int same_birthdays=0;
        //check for birthday pairs
        for (int a=0;a<23;a++)
                for (int b=0;b<23;b++)
                {
                        if (persons[a]==persons[b] && a!=b)
                                same_birthdays++;
                }
        if (same_birthdays>0)
                total_pairs++;
}
std::cout<<"chance of same birthday pair for 23 persons: "<<(double)total_pairs/tries*100<<"%";

It is a real test, simulation, using rand(), not just a math formula. Changed to C++ because someone said that writing in c# is pov. exe 15:54, 28 January 2006 (UTC)

I agree that the current formula is not really an empirical test. It is based on mathematical theory, and while a useful tool to generate the probabilities for certain numbers of people, it is by no means empirical.
But how is C++ any less POV than C#? — Ambush Commander(Talk) 20:36, 28 January 2006 (UTC)

Whose common intuition?

Whose common intuition does it contraddict? I asked my mother and my sister, neither of whom is particularly good at maths, wheter in their opinion it is more likely that in a 23-people group all birthday are different or at least two are equal. Both of them answered the latter. In addition, nobody on buying 23 trading cards of a set of 365 would expect them to be all different. --Army1987 21:04, 8 February 2006 (UTC)

Likely the way you phrased the question. Although we should find a survey. — Ambush Commander(Talk) 21:29, 8 February 2006 (UTC)

OS X program

Hmm... is it linkspam? — Ambush Commander(Talk) 20:42, 11 February 2006 (UTC)

Mistake in binomial distribution

The Birthday_paradox#Binomial_distribution calculation is not valid at all. While it somehow approximates the real probability, it lacks the necessary rationality:

1. The paradox is talking about "at least two persons may have same birthday". This includes not only pairs, but also 3 persons groups, 4 persons groups and 23 persons as well. This calculation is just considering pairs. The cumulative distribution of B(X,253,1/265) equals 1 in X=253 while forgetting all other n-groups.

2. The binomial distribution is just valid for independent events. These 253 possible pairs are not independent. For example for people A, B, C, ... having same birthday for pair (A,B) and (B,C) means having same day for (A,C). For instance Pr(X=252) should equals Pr(X=253) in this context while formula doesn't so. --Neshatian 11:50, 3 April 2006 (UTC)

Thanks to Michael Hardy, the section was removed. --Neshatian 15:05, 23 April 2006 (UTC)

Empirical Test.

I know APPLESOFT BASIC, and some Pascal, so I can recognise this section as a program for a computer.

Consider the non-programmer. How would they kn note.ow what the heck is going on - this should have an explanatory

Corrected Note 1

According to this worfram site (which cites a US census table - wish I could find the origional source) birthday frequencies are actually skewed to the months of July, August and September (in that order). It makes sense intuitivly because those months are roughly 9 months behind the winter holidays (more time spent indoors, cheerful festive atmosphere etc.) Extremely interesting article though, I enjoyed it alot --Cinexero 13:30, 15 June 2006 (UTC)

"Near matches" why so much trouble?

Can't you just divide a year in 52 weeks (52.17 or whatever) and use the same approach you do with 365 days? —Preceding unsigned comment added by 193.198.81.100 (talk) 21:31, 28 April 2009 (UTC)

"Near matches" - careless comment?

It is very careless to say that

"Thus in a family with six members, it is more likely than not that two members will have a birthday within a week of each other."

While i realise that it's tempting to make the group of six people a family, birthdays within a family are generally far from independent, especially if siblings are included!

Given the article it belongs to, i think the statement is out of place.

—The preceding unsigned comment was added by Rileen (talkcontribs) 12:51, 9 August 2006 (UTC).

Good point. Fixed. Thanks. --Keeves 18:33, 9 August 2006 (UTC)
Glad to see that - this was my first (which is why i also forgot to mention my name) teeny-weeny contribution to Wikipedia. Thanks! --Rileen, 11 August 2006

This is a valid statement within the context of the 'uniform birthday distribution' ideal set forth at the start of article. —The preceding unsigned comment was added by Qe2eqe (talkcontribs) 18:17, 17 April 2007 (UTC).

29 February

It makes the article significantly more complex to take into account 29 February, so most of it doesn't — except for bits of the first paragraph. I think the article should simply state early on that 29 February will be completely ignored, and then it should abide by that.

Does anyone object to this?

Ruakh 17:09, 13 August 2006 (UTC)

I had thought of that exact idea yesterday, and I'm glad you suggested it. I have now added such a paragraph, and I hope it will be noticable enough to bring this back-and-forth to an end. --Keeves 21:32, 13 August 2006 (UTC)
Hmm. Your paragraph is very clear, but I worry that it might be a bit much; after all, it's not really the central point of the article, but really just a caveat clarifying that the article is about an abstract mathematical concept and does not necessarily apply perfectly to real life. It seems like a single sentence could conceivably suffice. (Further, it's not true that "it is just as likely that a randomly-chosen person's birthday will be January 21 or October 5 or almost any other day of the calendar"; different times of year actually have fairly different birth rates.) Ruakh 03:12, 14 August 2006 (UTC)
A footnote should be more then sufficent to account for the leapyear birthdays and just have a further explaination thereof. —The preceding unsigned comment was added by 12.163.97.74 (talkcontribs).
Please see "Another Birthday Paradox" below.Cuddlyable3 (talk) 11:15, 3 October 2008 (UTC)
Folks: My english is not good but I tried to solve the problem with leap years. I found that solution needs to know years since the last leap century, wich occurs each 400 years. The probability depends on such data. Nevertheless the problem remains complex. It causes me to try a simple solution because all other solution needs more data as the avove mentioned last leap century, it also needs to know how many years we have to consider as a standrad time life. So I find that the exact an simple solution will be to ask how many persons are in the room with birthday on february 29. If there are none the formula p(n)is adequate. If one person has born on february 29 one have to switch n=n-1 and apply P(n). If there are 2 or more p(n)=1 and END.

Mystery!

"23"! --nlitement [talk] 13:37, 30 September 2006 (UTC)

Approximation based on Taylor Series expansion

Isn't the expression obtained for p_bar(n) =approx= 1*e^-1/356*e^-2/365*...*e^-(n-1)/365 in the "Approximation" subsection based on the inequality "(1-x) < (e^-x)"? It doesn't seem to be based on the Taylor Series expansion of the exponential function; instead, it directly uses the exponential function. —The preceding unsigned comment was added by Wiki user 618 (talkcontribs) 03:13, 9 November 2006 (UTC).

But exp(-x) = 1 - x + x^2/2 - ... IS the Taylor expansion of exp(-x). Cut the expansion off after the term in x, and you get exp(-x) = 1-x. You can see that 1-x < exp(-x), by looking at the quadratic term of the Taylor expansion (the Taylor expansion is an alternating series). So you are basically saying the same thing, but the article actually shows why the inequality holds. —The preceding unsigned comment was added by 134.58.253.131 (talkcontribs) 11:30, 12 February 2007 (UTC).

Near matches?

Can we either cite or insert the math for the "near match" birthdays? The information is stated without proof. arctic 00:28, 27 November 2006 (UTC)

You're right to be concerned; I just wrote a quick Perl script to investigate this probabilistically, and it seems like the correct values for the table would be:
within k days # people required
0 23
1 14
2 11
3 9
4 8
5 8 ← not 7, as the page currently has
7 7 ← not 6, as the page currently has
(Note: that was only probabilistic, but I used some basic techniques to ensure accurate results, so I'm fairly confident those are the correct values. As you say, though, a source or mathematical justification is quite necessary here. BTW, lest anyone be concerned that my random number generator might be biased — that's a valid concern, and it might well be, but if that were affecting the results, you'd typically expect greater synchronicity, hence lower values for the number of people necessary to ensure a 50% probability of a near match. The only bias that would produce higher values is a bias against synchronicity; that is, if the generator has a greater-than-random tendency against yielding recently-yielded numbers. I'll run some tests to ensure that that's not the case here, then comment back.)
Ruakh 01:49, 27 November 2006 (UTC)
Okay, I ran some tests, and it doesn't look like my random number generator has any bias against synchronicity. So, I think the table in the article is wrong. Ruakh 02:11, 27 November 2006 (UTC)
The table was indeed wrong, so I corrected it (and cited a source for the data). It's unfortunate that the error remained so long. --Sopoforic 03:16, 21 February 2007 (UTC)

Collision counting

This was useful to me testing a password generation programme, and I couldn't find anywhere else in Wikipedia that gave this information. The hash collision article could perhaps link directly to this section.

I suggest that the section's usefulness would be improved by the addition that for 1 << n << d, as is typical, the formula reduces to .

Also, knowing the standard deviation would be useful, as I'd like to assess how reasonable my observed number of collisions is. Jlittlenz 05:07, 8 December 2006 (UTC)

I am also interested in more detail on the counting issue. In particular, rather than just knowing the expected number of collisions, I'm wondering what is the distribution of the maximum number of collisions for any particular day across the year. To put it in hash terms, if you do get a hash collision, you typically use a tree or a linked list to store all the data with the same hash value. The question is, how many items do you have in your tree or linked list, in particular how long is the longest list going to be? It would be good to have a formula p(n,d,m) where n is the number of people/hash entries, d is the number of days per year/size of the hash table, and m is the maximum number of collisions. For example p(1,d,0) = 1 for all d, since there will be 0 collisions with only one person! Conversely p(1,d,m) = 0 for all m > 0, more generally p(n,d,m) = 0 for all m >= n—Preceding unsigned comment added by 213.123.216.147 (talk) 13:38, 18 October 2007 (UTC)

Forward vs. reverse?

Is there any reason for the notion that the "birthday problem" is computing p given n, while the "reverse birthday problem" is computing n given p (as implied in #Reverse problem)? Insofar as the two are separate problems, it seems to me that the ordinary birthday problem is computing the minimal n such that p > ½, so the "reverse birthday problem" would be computing p given n. —RuakhTALK 16:23, 8 December 2006 (UTC)

366 People isn't necessarily 100%

In the article, it claims, although it cannot actually be 100% unless there are at least 366 people.[1] However, the real number should be 367, because there are 366 days in a leap year. Someone could be born on February 29th, and then no one would have the same birthday out of 366 people. Correct me if I'm wrong, Thanks, RAmen, Demosthenes 21:48, 15 February 2007 (UTC)

Yes, you are correct, but check the footnote. We're assuming 365 days in a year for sake of simplicity in calculations. — Edward Z. Yang(Talk) 22:48, 15 February 2007 (UTC)

Featured Article

I think this is an interesting subject, great article i think it's featured article quality comments? I dont know how to tag it for that status myself.. —The preceding unsigned comment was added by 83.49.105.6 (talk) 20:56, 12 March 2007 (UTC).

There are instructions at WP:FAC. Greeves (talk contribs) 15:50, 31 March 2007 (UTC)

Conundrum, not paradox

Paradox? Don't you mean conundrum. a paradox is something I thought to be impossible!! And this is not. —The preceding unsigned comment was added by 131.111.8.98 (talkcontribs) 00:05, 18 April 2007 (UTC).

To quote the third sentence of the article:
This is not a paradox in the sense of leading to a logical contradiction; it is called a paradox because mathematical truth contradicts naive intuition: most people estimate that the chance is much lower than 50%.
RuakhTALK 01:49, 18 April 2007 (UTC)


I have only come across this before referred to as the 'Birthday Surprise' which is a better description since it is not a paradox. You should mention that it is also known as 'Birthday Surprise' (as it has in note 4 and the Klamkin and Newman reference).62.255.240.100 12:18, 15 May 2007 (UTC)

Look up "paradox" in a dictionary. The first definition usually is something like, "A true statment that seems not to make sense at first glance." Then, somewhere further down the list of definitions, you'll find the one that says, "a mathematical proposition that can neither be proven, nor disproven." —Preceding unsigned comment added by 192.55.12.36 (talk) 20:08, 29 February 2008 (UTC)

My IB Math Studies Project

Hi people, I have already contributed in the past to this article, and having done my math studies project on this topic, I would like to contribute with my testings.

By the way why is my name always deleted from the references?? I contributed to this article —The preceding unsigned comment was added by 90.0.168.169 (talk) 19:15, 30 April 2007 (UTC).

Hi,
Thanks for your contributions!
Please note that the purpose of the "references" section is to show the sources from which Wikipedians take their information, not to show the Wikipedians who took their information from those sources.
Unless you can cite a specific claim in the article that comes from your high school math project and not from the other references, it looks (and is) really awkward to list a contributor's high school math project in the references. And if there is a claim that comes from your high school math project and not from the other references, then we should consider removing that claim until such time as you get your research published in a peer-reviewed journal; verifiability from reputable sources is a fundamental element of Wikipedia. (This is not in any way to criticize your math project; I'm familiar with the IB program, and am sure that your project was very impressive by high school standards. Kudos to you. But that doesn't make it appropriate as a reference here. After all, just because something was an IB project, that doesn't mean that it garnered a good grade, or that it would stand up to professional scrutiny. IB projects are first and foremost a learning experience.)
RuakhTALK 01:23, 1 May 2007 (UTC)


Hi again As I was doing my math project, ergo in the past year and a half, I periodically updated this article with what I found with my researches. Unfortunately I didn't know that my project would be sent to IB to be verified. This means that they may think I have copied, even though it is the exact contrary :( That's why I would like my name to appear. Anyways thanks for the answer —The preceding unsigned comment was added by 90.0.46.7 (talkcontribs) 12:32, 1 May 2007 (UTC).

Oh, scary. Best of luck with that. :-/ —RuakhTALK 15:02, 1 May 2007 (UTC)


So is it ok to put me in the references? —The preceding unsigned comment was added by 90.0.46.7 (talkcontribs) 16:08, 1 May 2007 (UTC).

I'm sorry, but that's still not what references are for. :-/ —RuakhTALK 16:41, 1 May 2007 (UTC)

And In the External link can I put back the test I did with the players? —The preceding unsigned comment was added by 90.0.46.7 (talkcontribs) 20:36, 1 May 2007 (UTC).

Nonsense.

i didn't believe that these calculations are correct so i made an experiment: i rounded up 3 groups of people. every group constisted of 50 people so there SHOULD be a 97% chance that 2 of them celebrate their birthday on the same day (in each group).

guess what: there weren't 2 guys with the same birthday in none of the groups. in fact there weren't even any guys in these 150 persons who had their birthday on the same day.

mathematically, the chance of that happening is close to 0 (like 0,0x%). so the result of the experiment would be far more than highly unlikely. still, it proves that this calculation (like everything in maths) is just another pathetic attempt of human beings to define the nature of our environment. —The preceding unsigned comment was added by Sonandzon (talkcontribs) 21:26, 5 May 2007 (UTC).

Re: Nonsense
Unsigned, juvenile, biased, and disparaging commentary will be (and has been, here) deleted. The validity of basic statistical and mathematical methodology is beyond reproach. Lesotho 23:09, 5 May 2007 (UTC)
Re: who you talking to? —The preceding unsigned comment was added by 90.4.17.239 (talkcontribs) 12:38, 8 May 2007 (UTC).
Sonandzon, I think this is an interesting study. If what you found is repeatable, it shows that although the mathematics is correct, the assumptions behind it are not. Why don't you do a serious experiment and publish your findings in a peer reviewed conference or journal? -Pgan002 00:26, 22 June 2007 (UTC)
Re: Nonsense: You are obviously lying. If you would have done what you claim you would have found lots of matching pairs.Likebox 14:38, 18 October 2007 (UTC)
Please don't feed trolls. (In this case it's not a big deal, as (s)he's presumably moved on, but it's a bad habit to get into. If you don't know what I'm talking about, please read Wikipedia:What is a troll?.) —RuakhTALK 03:25, 19 October 2007 (UTC)
Assuming that Sonandzo isn't simply being a jerk for the fun of it, it is entirely possible for this to occur. Highly improbable, yes. Impossible, no. 49.199.220.84 (talk) 03:58, 4 October 2011 (UTC)

Wrong

I disagree with the mathematics on which this article is based.

The probability that in a group of two, both share the same birthday = 1/366.

The probablility that in a group of three, one pair share the same birthday (three people make 3 pairs = (1/366) + (365/366)*(1/365) + (365/366)*(364/365)*(1/364) = 3/366

The probability that in a group of four (i.e. 6 pairs) that one pair share the same birthday is (1/366) + (365/366)*(1/365) + (365/366)*(364/365)*(1/364) + (365/366)*(364/365)*(363/364)*(1/363) + (365/366)*(364/365)*(363/364)*(362/363)*(1/362) + (365/366)*(364/365)*(363/364)*(362/363)*(361/362)*(1/361) = 6/366

From this it can be seen that for a group of people, there will be pairs, and that the probability that one pair share the same birthday will thus be .

Hence the probability that a single pair in a group of people will share the same birthday will be at least 50% when there are 133 pairs, because 133=366/2. This occurs when (approximate to nearest whole).

The probability that a single pair in a group of people will share the same birthday is 100% when there are 366 or more pairs of people: this occurs when (approximate to nearest whole). --Justificatus 15:47, 6 August 2007 (UTC)

The article explicitly skips leap years, but it sounds like you're calculating based on one. Mmernex 16:02, 6 August 2007 (UTC)
You're close, but a bit off. The problem is, it's possible for three people to all share a birthday, and you're counting each case of this as though it were three cases of two people sharing a birthday, when for our purposes it's actually equivalent to just one case of two people sharing a birthday. (And, analogously with larger groups.) In more general terms, remember that P ( AB ) = P ( A ) + P ( B ) − P ( AB ); hence, P ( AB ) ≠ P ( A ) + P ( B ) (except in the special case that A and B are mutually exclusive). —RuakhTALK 18:39, 6 August 2007 (UTC)
There may be something more deeply wrong with Justificatus' argument than you realize. Read that last paragraph again. It says that the probability of a shared birthday in a random group of 27 people is 100%. That is to say, it's guaranteed. That is to say, there is no possible way to choose 27 people such that each of them has a different birthday.
Justificatus "disagrees" with the math on this page. But, this is not some ivory tower abstraction that we're talking about here, this should be basic common sense. 192.55.12.36 (talk) 20:53, 29 February 2008 (UTC)
I agree with all of your comment except the "more deeply wrong" part. Superficially, the problem with his argument is that its result is nonsensical; but I think "your result is wrong because it contradicts this obvious fact: ___" is less helpful than "your reasoning is flawed because you're making this erroneous assumption: ___". Hence, I tried to explain why his reasoning was flawed, rather than how I knew it was flawed. —RuakhTALK 23:06, 19 April 2008 (UTC)

Regarding: everything. I think that although Mathematically the argument is sound, this problem is based on statistics and so although it may be X% likely that in a group of Y ammount of people (X being smaller than 100, Y being smaller than 367) 2 people will share the same birthday, one could round up many groups of this number and not find it to be true. the same principle applies for rolling a die (singular of dice) although the probability of rolling any number is roughly 17%, in practise one could roll a die 200 times and still not get a 6. —Preceding unsigned comment added by 137.205.148.34 (talk) 18:14, 14 May 2010 (UTC)

365 different birthdays -- what am I missing?

I wish to take issue with the statement, "After having met people with n different birthdays (n < 365), the chance that the next person you meet has a colliding birthday is (365-n)/365."

Suppose I am just getting started. I have met one person, and recorded their birthday. The number of different birthdays n is then 1.

The quoted text asserts that there are 364 chances out of 365 that the second person I meet will have the same birthday as the first one. This seems counter-intuitive. Likewise, if my list of birthdates is long, and I have n = 364, then the equation predicts that there is only one chance in 365 that the next person I meet has a birthday that is already on my list.

In summary, the equation says that at the beginning when n is small, I will have a large number of collisions, and as the size of my list of birthdays grows, the likelihood of a collision will decrease. --Jamesglong 18:14, 13 August 2007 (UTC)

Heh, good catch. I'm not sure how to fix this, though; in general, that section is assuming some things I'm not sure about. I think for now I'll label it "disputed", and we can work on fixing it up. —RuakhTALK 18:49, 13 August 2007 (UTC)
I think that (365-n)/365 is the probability that the next person you meet will have a non-colliding birthday. After you've collected one birthday, there is a good chance (364 in 365) that the next person will have a different birthday. Once you've collected 365 birthdays, you cannot collect any new birthdays and the probability is zero.--143.215.153.96 19:53, 14 August 2007 (UTC)

Coupon collector's problem

I didn't find a page for this problem on wikipeida, altough it is related to this article. There are N coupons on a desk, and you have to pick from them randomly (without taking the picked one from the desk). What is the expected amount of picks, until you have picked al of them at least once.

Google found the answer N*(ln N + Euler's constant) here —Preceding unsigned comment added by 81.183.27.21 (talk) 17:20, August 24, 2007 (UTC)

The pigeon hole principle says that there MUST be at least two people with the same birthdays if you have 367 or more people. Imagine 366 people (allowing for leap years), with one birthday on each of the 366 days of the (leap) year. When you introduce the 367th. person, this persons birthday can only happen on one of the already occupied birthdays. —Preceding unsigned comment added by 80.2.196.26 (talk) 18:22, 14 September 2007 (UTC)

Please read the first footnote. :-) —RuakhTALK 21:43, 14 September 2007 (UTC) —Preceding unsigned comment added by Ruakh (talkcontribs)

How about sharing the same birthday date?

I think one of the earlier posts discussing a test with groups of 50 may have been looking for the same birthday date (month/day/year) as opposed to simply the same day (month/day). Can you compute the odds for the same date? I suppose the distribution of people would impact the odds - but, can you make estimates based on assumptions for your groups ("even distribution of ages 20-60", or "distribution mirroring earth's population at each age")? Now that seems like a fun excercise. And one I'm not capable of completing.  :-(

69.134.204.75 20:58, 14 September 2007 (UTC)

Well, the "even distribution of ages 20–60" version is not hard: it's the same as the normal birthday problem, but assuming 14,975 distinct days instead of 365. With that assumption, it would take 145 people before you have a 50% chance that two had the same DOB — still amazingly few, IMHO. The latter version would obviously requiring knowing that distribution, which I do not. (And the math would be significantly more complicated, too. I'm actually not sure exactly how one could go about that, other than a brute-force or probabilistic method.) —RuakhTALK 21:53, 14 September 2007 (UTC) —Preceding unsigned comment added by Ruakh (talkcontribs)

GA Review

This article does not meet the Good Article criteria at this time, and will not be listed. Currently, I'd probably grade it as somewhere on the border of Start-class and B-class. The main issues are some major organizational and prose issues, as well as insufficient reference citations. I am also unclear whether the items mentioned under 'references' were really used to back up anything in the article at all, or are, in fact, just extra books that some might want to use to look up additional information about the subject. In which case, the section should be renamed to further reading.

There's a considerable amount of text in the 'notes' section. Why isn't this used in the article itself. These are clearly not footnotes! It would help editors to review WP:CITE for guidelines on how to properly use inline citations to cite material in an article.

The section header titles are fairly long, and not very clear and concise. It would be advisable to shorten and simplify these so that they are easier to understand when reading the table of contents. Some sections might be reorganized and merged with others as well, and check that all second, third, and fourth level subsection headers are absolutely necessary.

Many sections seem overly dependent on some very complex equations with very little text to actually describe the significance and meaning of these equations. This is likely to scare off many readers. Equations should also have a citation, so that we can verify that what is being included is clearly not original research. The citation should not appear next to the equation itself, it should appear in the sentence preceding and referring to the equation, usually immediately after a colon, such as in "the probability that two birthdays coincide can be attributed to the following equation:[1]

The lead section actually does a reasonably good job of introducing the topic, but could better summarize the article. It might help to review WP:LEAD for tips on this. Also, the 'birthday attack' is mentioned in the lead, and briefly in the text, but there is no source for this information, and the text is pretty vague.

Hopefully, this will help editors improve the article. Good luck! Dr. Cash 04:59, 2 October 2007 (UTC)

footnote86.42.66.94 (talk) 00:59, 10 January 2008 (UTC)
Thanks! :-) —RuakhTALK 14:50, 2 October 2007 (UTC)

Knapsack Problem?

The formulation of the knapsack problem on this page is quite different from the knapsack problem with which I am familiar, which is as it is described in its own Wikipedia page. A bit of research reveals that what is described here appears to be the partition problem. The partition problem is a special case of the subset-sum problem, which is itself a special case of the knapsack problem. Thus, I suppose the characterization in this article is not strictly incorrect, but it is certainly misleading. I apologize if this discussion ends up at the wrong place; this is my first contribution to Wikipedia and I am not quite sure what I am doing. —Preceding unsigned comment added by 216.164.23.210 (talk) 03:24, 26 October 2007 (UTC)

It ended up in exactly the right place! It's a good idea to sign your talk entries (but not article changes) - writing four tildes (~) or clicking the signature button over the edit window will do that. If you are not logged in, it will just add your IP number and a timestamp (click "Show preview" below the edit window to see the effect).
I've changed the article to say "a variant of the knapsack problem"; I think it's correct this way, though it could still be improved upon. Go ahead, be bold, and do so, if you see a way!--Niels Ø (noe) 08:12, 26 October 2007 (UTC)
  1. ^ your citation