ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Correspondence

Identical twins and Bayes' theorem in the 21st century

[version 1; peer review: 1 not approved]
PUBLISHED 17 Dec 2013
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

In a recent article in Science on "Bayes' Theorem in the 21st Century", Bradley Efron uses Bayes' theorem to calculate the probability that twins are identical given that the sonogram shows twin boys. He concludes that Bayesian calculations cannot be uncritically accepted when using uninformative priors. We argue that this conclusion is problematic because Efron's example on identical twins does not use data, hence it is not Bayesian statistics; his priors are not appropriate and are not uninformative; and using the available data point and an uninformative prior actually leads to a reasonable posterior distribution.

Correspondence

Efron1 provides four examples of Bayesian analyses, two of which underline the remarkable potential of Bayesian methods. Based on one of the other examples, however, Efron ultimately concludes that Bayesian analyses using uninformative priors cannot be uncritically accepted and should be checked by frequentist methods. While we wholeheartedly agree that statistical results should not be uncritically accepted, we find Efron’s example ineffective in showing that Bayesian statistics require more careful checking than any other kind of statistics.

In his example on uninformative priors, Efron uses Bayes’ theorem to calculate the probability that twins are identical given that the sonogram shows twin boys. Efron finds this probability to be 2/3 when using an uninformative prior versus 1/2 with an informative prior and thereby concludes that an uninformative prior does not have the desired neutral effects on the output of Bayes’ rule. We argue that this example is not only flawed, but useless in illustrating Bayesian data analysis because it does not rely on any data. Although there is one data point (a couple is due to be parents of twin boys, and the twins are fraternal), Efron does not use it to update prior knowledge. Instead, Efron combines different pieces of expert knowledge from the doctor and genetics using Bayes’ theorem. While certainly an impeccable probability law, Bayes’ theorem is a mathematical equation, not a statistical model describing how data may be produced. In essence, Efron uses this equation to show that the value on the left side of the equation changes when a term on the right side is changed, which is trivial and could be shown with any mathematical equation also in a non-Bayesian context. Without new data, our knowledge is by definition determined by prior information; thus, showing that the outcome of a Bayesian analysis with no new data is heavily influenced by the prior would not argue against Bayesian methods. Indeed, without data, Efron’s example is not Bayesian statistics and his conclusion about Bayesian statistics based on this example is unjustified.

We also have other more technical issues with Efron’s example. Efron interprets the term P(A) on the right side of the equation (see sidebar in Efron 2013a1) as the prior on the probability that twins are identical. To make this prior uninformative, it is assigned a value of P(A) = 0.5 (see Efron 2013b2; although this is not stated in Efron 2013a1). This uninformative prior is set in contrast to the informative “doctor’s prior” of P(A) = 1/3. First, however, the parameter of interest is P(A|B) rather than P(A) according to Efron’s study question (see sidebar in Efron 2013a1), thus the focus should be on the appropriate prior for P(A|B). Second, for the uninformative prior, Efron mentions erroneously that he used a uniform distribution between zero and one, which is clearly different from the value of 0.5 that was used. Third, we find it at least debatable whether a prior can be called an uninformative prior if it has a fixed value of 0.5 given without any measurement of uncertainty. For example, if we knew that our chance of winning the next million-dollar jackpot were 50:50, would we really call this uninformative?

If we use the data point together with an uninformative uniform prior on P(A|B) to determine the probability of identical twins given the twins are two boys (see Box 1), we obtain, with 95% certainty, a probability of between 0.01 and 0.84; if we use a highly informative prior based on information from the doctor and genetics, we obtain a probability of between 0.49 and 0.51. This looks completely reasonable to us, although of course we do not know much more than we knew before because we had only a single data point.

We would very much like to check our calculations using frequentist methods; however, this is impossible because there is only one data point, and frequentist methods generally cannot handle such situations. Although we agree with Efron1 that the choice of the prior is essential, we conclude that his article gives a biased impression of the influence of uninformative priors. In his example using Bayes’ theorem, we found no reliable support for his main conclusion that Bayesian calculations cannot be uncritically accepted when using uninformative priors.

Box 1. Study question: What is the probability of identical twins given the twins are two boys?

Data: One pair of twin boys is fraternal.

Data model: x~Binomial(θ, n), where θ is the probability of identical twins given the twins are two boys, x is the number of identical twins in the data, and n is the total number of pairs of twin boys; in our case: x = 0 and n = 1.

The posterior distribution p(θ|x) is obtained using Bayes’ theorem

p(θ|x) = p(x|θ)p(θ)/p(x)

We use two different priors p(θ):

1) Uninformative prior: p(θ) = Unif(0,1) = Beta(1,1).

2) Informative prior: using the information from the doctor and from genetics, we are quite sure that θ must be around 0.51. Transforming this information into a statistical distribution yields p(θ) = Beta(10000, 10000), which has a mean of 0.5 and a 95% interval of 0.49307 – 0.50693. [Note that we had to choose the 95% interval arbitrarily because we are not informed about the certainty of the information provided by the doctor and by genetics].

Given the single parameter Binomial model, x~Binomial(θ, n), and the prior p(θ) = Beta(α,β), the solution of the Bayesian analysis is given by the posterior distribution p(θ|x) = Beta(α+x,β+n-x) [see any Bayesian textbook, e.g. Gelman et al. 20043, p. 34].

The probability of identical twins given the twins are two boys:

1) Uninformative prior: p(θ|x) = Beta(1+x,1+n-x) = Beta(1+0,1+1-0) = Beta(1, 2), which has an expected value of 0.33 and a 95% interval of 0.013 – 0.84.

2) Informative prior: p(θ|x) = Beta(10000+x,10000+n-x) = Beta(10000+0,10000+1-0) = Beta(10000, 10001), which has an expected value of 0.49998 and a 95% interval of 0.49305 – 0.50690.

Comments on this article Comments (1)

Version 2
VERSION 2 PUBLISHED 29 Jul 2015
Revised
Version 1
VERSION 1 PUBLISHED 17 Dec 2013
Discussion is closed on this version, please comment on the latest version above.
  • Reader Comment 06 Jan 2014
    M Aaron MacNeil, Australian Institute of Marine Science, Australia
    06 Jan 2014
    Reader Comment
    This clear, effective review highlights brilliantly a recurrent fundamental error in quantitative sciences, namely 'what is the question being asked?' By identifying the assumptions behind a problem - as McCarthy ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Amrhein V, Roth T and Korner-Nievergelt F. Identical twins and Bayes' theorem in the 21st century [version 1; peer review: 1 not approved]. F1000Research 2013, 2:278 (https://doi.org/10.12688/f1000research.2-278.v1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 17 Dec 2013
Views
232
Cite
Reviewer Report 24 Dec 2013
Michael McCarthy, School of Botany, University of Melbourne, Melbourne, Australia 
Not Approved
VIEWS 232
This paper by Amrhein et al. criticizes a paper by Bradley Efron that discusses Bayesian statistics (Efron, 2013a), focusing on a particular example that was also discussed in Efron (2013b). The example concerns a woman who is carrying twins, both ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
McCarthy M. Reviewer Report For: Identical twins and Bayes' theorem in the 21st century [version 1; peer review: 1 not approved]. F1000Research 2013, 2:278 (https://doi.org/10.5256/f1000research.3175.r2816)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 29 Jul 2015
    Valentin Amrhein, Research Station Petite Camargue Alsacienne, 68300 Saint-Louis, France
    29 Jul 2015
    Author Response
    We would like to sincerely thank Michael McCarthy for his thorough review, and we revised our paper accordingly. McCarthy's main point is that Efron's calculations and our approach differ because ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 29 Jul 2015
    Valentin Amrhein, Research Station Petite Camargue Alsacienne, 68300 Saint-Louis, France
    29 Jul 2015
    Author Response
    We would like to sincerely thank Michael McCarthy for his thorough review, and we revised our paper accordingly. McCarthy's main point is that Efron's calculations and our approach differ because ... Continue reading

Comments on this article Comments (1)

Version 2
VERSION 2 PUBLISHED 29 Jul 2015
Revised
Version 1
VERSION 1 PUBLISHED 17 Dec 2013
Discussion is closed on this version, please comment on the latest version above.
  • Reader Comment 06 Jan 2014
    M Aaron MacNeil, Australian Institute of Marine Science, Australia
    06 Jan 2014
    Reader Comment
    This clear, effective review highlights brilliantly a recurrent fundamental error in quantitative sciences, namely 'what is the question being asked?' By identifying the assumptions behind a problem - as McCarthy ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.