Keywords
Baye's theorem, identical twins, Bayesian, uninformative priors
Baye's theorem, identical twins, Bayesian, uninformative priors
In our manuscript, we now clarified that our approach is different from the calculations provided by Efron. We also shortened the manuscript and removed statements that were criticized by referee Michael McCarthy.
See the authors' detailed response to the review by Michael McCarthy
Efron1 provides four examples of Bayesian analyses, two of which underline the remarkable potential of Bayesian methods. Based on one of the other examples, however, Efron ultimately concludes that Bayesian analyses using uninformative priors cannot be uncritically accepted and should be checked by frequentist methods. While we wholeheartedly agree that statistical results should not be uncritically accepted, we find Efron’s example ineffective in showing that Bayesian statistics require more careful checking than any other kind of statistics.
In his example on uninformative priors, Efron uses Bayes’ theorem to calculate the probability that twins are identical given that the sonogram shows twin boys. Efron finds this probability to be 2/3 when using an uninformative prior versus 1/2 with an informative prior and thereby concludes that an uninformative prior does not have the desired neutral effects on the output of Bayes’ rule. We argue that this example is relatively useless in illustrating Bayesian data analysis. One reason is that Efron considers the particular set of twin boys as the entire population. In this case, statistics is not needed because there is no random sample drawn from a larger population. Rather, Efron combines different pieces of expert knowledge from the doctor and genetics using Bayes’ theorem. While certainly an impeccable probability law, Bayes’ theorem is a mathematical equation, not a statistical model describing how data may be produced. In essence, Efron uses this equation to show that the value on the left side of the equation changes when a term on the right side is changed, which is trivial and could be shown with any mathematical equation also in a non-Bayesian context.
Efron’s example can be rearranged so that it fits a more realistic situation in statistical data analysis, albeit with a very low sample size: consider the twin boys that, as Efron casually mentions, turned out to be fraternal, as a random sample from the larger population of twin boys and try to draw inference about the proportion of identical twins among the population of twin boys (note that this approach is different from the calculations provided by Efron). If we use the data point together with an uninformative uniform prior on P(A|B) (see Box 1) to determine the probability of identical twins given the twins are two boys, we obtain, with 95% certainty, a probability of between 0.01 and 0.84; if we use a highly informative prior based on information from the doctor and genetics, we obtain a probability of between 0.49 and 0.51. This looks completely reasonable to us, although of course we do not know much more than we knew before because we had only a single data point. We think that to illustrate the influence of non-informative priors on results of Bayesian data analyses, such an approach would be fairer than the calculations given by Efron.
Data: One pair of twin boys is fraternal.
Data model: x~Binomial(θ, n), where θ is the probability of identical twins given the twins are two boys, x is the number of identical twins in the data, and n is the total number of pairs of twin boys; in our case: x = 0 and n=1.
The posterior distribution p(θ|x) is obtained using Bayes' theorem
p(θ|x) = p(x|θ)p(θ)/p(x)
We use two different priors p(θ):
1) Uninformative prior: p(θ) = Unif(0,1) = Beta(1,1)
2) Informative prior: using the information from the doctor and from genetics, we are quite sure that θ must be around 0.51 Transforming this information into a statistical distribution yields p(θ) = Beta(10000, 10000), which has a mean of 0.5 and a 95% interval of 0.493 – 0.507. [Note that we had to choose the 95% interval arbitrarily because we are not informed about the certainty of the information provided by the doctor and by genetics].
Given the single parameter Binomial model, x~Binomial(θ, n), and the prior p(θ) = Beta(α,β), the solution of the Bayesian analysis is given by the posterior distribution p(θ|x) = Beta(α+x,β+n-x) [see any Bayesian textbook, e.g. Gelman et al. 20042, p. 34]
The probability of identical twins given the twins are two boys:
1) Uninformative prior: p(θ|x) = Beta(1+x,1+n-x) = Beta(1+0,1+1-0) = Beta(1, 2), which has an expected value of 0.33 and a 95% interval of 0.013 – 0.84.
2) Informative prior: p(θ|x) = Beta(10000+x,10000+n-x) = Beta(10000+0,10000+1-0) = Beta(10000, 10001), which has an expected value of 0.50 and a 95% interval of 0.49 – 0.51.
Although we agree with Efron1 that the choice of the prior is essential, we conclude that his article gives a biased impression of the influence of uninformative priors. In his example using Bayes’ theorem, we found no reliable support for his main conclusion that Bayesian calculations cannot be uncritically accepted when using uninformative priors.
FK-N analyzed the data point. VA wrote the first draft of the manuscript. All authors contributed to the discussion and approved the final version of the manuscript.
This work was funded by the Swiss Association Pro Petite Camargue Alsacienne and the Fondation de Bienfaisance Jeanne Lovioz.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors would like to thank Yves-Laurent Grize and Pius Korner for discussions, and Michael McCarthy for valuable comments on the manuscript.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for commenting on the previous publication clearly described?
Yes
Are any opinions stated well-argued, clear and cogent?
Yes
Are arguments sufficiently supported by evidence from the published literature or by new data and results?
No
Is the conclusion balanced and justified on the basis of the presented arguments?
No
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 29 Jul 15 |
read | read |
Version 1 17 Dec 13 |
read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
This critique also has several important elements that would benefit all peer review:
If only all science operated in the same vein, we'd be far better off.
This critique also has several important elements that would benefit all peer review:
If only all science operated in the same vein, we'd be far better off.