Peer Review Examples - F1000Research

Home For Reviewers Examples

Peer Review Examples

Good explanation of relevance of article

Annalisa Pastore says:

The genesis of this paper is the proposal that genomes containing a poor percentage of guanosine and cytosine (GC) nucleotide pairs lead to proteomes more prone to aggregation than those encoded by GC-rich genomes. As a consequence these organisms are also more dependent on the protein folding machinery. If true, this interesting hypothesis could establish a direct link between the tendency to aggregate and the genomic code.

Read full report

In their paper, the authors have tested the hypothesis on the genomes of eubacteria using a genome-wide approach based on multiple machine learning models. Eubacteria are an interesting set of organisms which have an appreciably high variation in their nucleotide composition with the percentage of CG genetic material ranging from 20% to 70%. The authors classified different eubacterial proteomes in terms of their aggregation propensity and chaperone-dependence. For this purpose, new classifiers had to be developed which were based on carefully curated data. They took account for twenty-four different features among which are sequence patterns, the pseudo amino acid composition of phenylalanine, aspartic and glutamic acid, the distribution of positively charged amino acids, the FoldIndex score and the hydrophobicity. These classifiers seem to be altogether more accurate and robust than previous such parameters.

The authors found that, contrary to what expected from the working hypothesis, which would predict a decrease in protein aggregation with an increase in GC richness, the aggregation propensity of proteomes increases with the GC content and thus the stability of the proteome against aggregation increases with the decrease in GC content. The work also established a direct correlation between GC-poor proteomes and a lower dependence on GroEL. The authors conclude by proposing that a decrease in eubacterial GC content may have been selected in organisms facing proteostasis problems. A way to test the overall results would be through in vitro evolution experiments aimed at testing whether adaptation to low GC content provide folding advantage.

The main strengths of this paper is that it addresses an interesting and timely question, finds a novel solution based on a carefully selected set of rules, and provides a clear answer. As such this article represents an excellent and elegant bioinformatics genome-wide study which will almost certainly influence our thinking about protein aggregation and evolution. Some of the weaknesses are the not always easy readability of the text which establishes unclear logical links between concepts.

Another possible criticism could be that, as any in silico study, it makes strong assumptions on the sequence features that lead to aggregation and strongly relies on the quality of the classifiers used. Even though the developed classifiers seem to be more robust than previous such parameters, they remain only overall indications which can only allow statistical considerations. It could of course be argued that this is good enough to reach meaningful conclusions in this specific case.

View Article & Reports

Show less

Detailed analysis puts article in context

Eva Delpon and Ricardo Caballero say:

The paper by Chevalier et al. analyzed whether late sodium current (I_NaL) can be assessed using an automated patch-clamp device. To this end, the I_NaL effects of ranolazine (a well known I_NaL inhibitor) and veratridine (an I_NaL activator) were described. The authors tested the CytoPatch automated patch-clamp equipment and performed whole-cell recordings in HEK293 cells stably transfected with human Nav1.5. Furthermore, they also tested the electrophysiological properties of human induced pluripotent stem cell-derived cardiomyocytes (hiPS) provided by Cellular Dynamics International. The title and abstract are appropriate for the content of the text. Furthermore, the article is well constructed, the experiments were well conducted, and analysis was well performed.

Read full report

I_NaL is a small current component generated by a fraction of Nav1.5 channels that instead to entering in the inactivated state, rapidly reopened in a burst mode. I_NaL critically determines action potential duration (APD), in such a way that both acquired (myocardial ischemia and heart failure among others) or inherited (long QT type 3) diseases that augmented the I_NaL magnitude also increase the susceptibility to cardiac arrhythmias. Therefore, I_NaL has been recognized as an important target for the development of drugs with either antiischemic or antiarrhythmic effects. Unfortunately, accurate measurement of I_NaL is a time consuming and technical challenge because of its extra-small density. The automated patch clamp device tested by Chevalier et al. resolves this problem and allows fast and reliable I_NaL measurements.

The results here presented merit some comments and arise some unresolved questions. First, in some experiments (such is the case in experiments B and D in Figure 2) current recordings obtained before the ranolazine perfusion seem to be quite unstable. Indeed, the amplitude progressively increased to a maximum value that was considered as the control value (highlighted with arrows). Can this problem be overcome? Is this a consequence of a slow intracellular dialysis? Is it a consequence of a time-dependent shift of the voltage dependence of activation/inactivation? Second, as shown in Figure 2, intensity of drug effects seems to be quite variable. In fact, experiments A, B, C, and D in Figure 2 and panel 2D, demonstrated that veratridine augmentation ranged from 0-400%. Even assuming the normal biological variability, we wonder as to whether this broad range of effect intensities can be justified by changes in the perfusion system. Has been the automated dispensing system tested? If not, we suggest testing the effects of several K⁺ concentrations on inward rectifier currents generated by Kir2.1 channels (I_Kir2.1).

The authors demonstrated that the recording quality was so high that the automated device allows to the differentiation between noise and current, even when measuring currents of less than 5 pA of amplitude. In order to make more precise mechanistic assumptions, the authors performed an elegant estimation of current variance (σ²) and macroscopic current (I) following the procedure described more than 30 years ago by Van Driessche and Lindemann ¹. By means of this method, Chevalier et al. reducing the open channel probability, while veratridine increases the number of channels in the burst mode. We respectfully would like to stress that these considerations must be put in context from a pharmacological point of view. We do not doubt that ranolazine acts as an open channel blocker, what it seems clear however, is that its onset block kinetics has to be “ultra” slow, otherwise ranolazine would decrease peak I_NaL even at low frequencies of stimulation. This comment points towards the fact that for a precise mechanistic study of ionic current modifying drugs it is mandatory to analyze drug effects with much more complicated pulse protocols. Questions thus are: does this automated equipment allow to the analysis of the frequency-, time-, and voltage-dependent effects of drugs? Can versatile and complicated pulse protocols be applied? Does it allow to a good voltage control even when generated currents are big and fast? If this is not possible, and by means of its extraordinary discrimination between current and noise, this automated patch-clamp equipment will only be helpful for rapid I_NaL-modifying drug screening. Obviously it will also be perfect to test HERG blocking drug effects as demanded by the regulatory authorities.

Finally, as cardiac electrophysiologists, we would like to stress that it seems that our dream of testing drug effects on human ventricular myocytes seems to come true. Indeed, human atrial myocytes are technically, ethically and logistically difficult to get, but human ventricular are almost impossible to be obtained unless from the explanted hearts from patients at the end stage of cardiac diseases. Here the authors demonstrated that ventricular myocytes derived from hiPS generate beautiful action potentials that can be recorded with this automated equipment. The traces shown suggested that there was not alternation in the action potential duration. Is this a consistent finding? How long do last these stable recordings? The only comment is that resting membrane potential seems to be somewhat variable. Can this be resolved? Is it an unexpected veratridine effect? Standardization of maturation methods of ventricular myocytes derived from hiPS will be a big achievement for cardiac cellular electrophysiology which was obliged for years to the imprecise extrapolation of data obtained from a combination of several species none of which was representative of human electrophysiology. The big deal will be the maturation of human atrial myocytes derived from hiPS that fulfil the known characteristics of human atrial cells.

Minor points:

We suggest suppressing the initial sentence of section 3. We surmise that results obtained from the experiments described in this section cannot serve to understand the role of I_NaL in arrhythmogenesis.

Reference

1. Van Driessche W, Lindemann B: Concentration dependence of currents through single sodium-selective pores in frog skin. Nature. 1979; 282 (5738): 519-520 PubMed Abstract | Publisher Full Text

View Article & Reports

Show less

Critical re-review with well-organized points of concern

Sam Schwarzkopf says:

The authors have clarified several of the questions I raised in my previous review. Unfortunately, most of the major problems have not been addressed by this revision. As I stated in my previous review, I deem it unlikely that all those issues can be solved merely by a few added paragraphs. Instead there are still some fundamental concerns with the experimental design and, most critically, with the analysis. This means the strong conclusions put forward by this manuscript are not warranted and I cannot approve the manuscript in this form.

Read full report

The greatest concern is that when I followed the description of the methods in the previous version it was possible to decode, with almost perfect accuracy, any arbitrary stimulus labels I chose. See https://doi.org/10.6084/m9.figshare.1167456 for examples of this reanalysis. Regardless of whether we pretend that the actual stimulus appeared at a later time or was continuously alternating between signal and silence, the decoding is always close to perfect. This is an indication that the decoding has nothing to do with the actual stimulus heard by the Sender but is opportunistically exploiting some other features in the data. The control analysis the authors performed, reversing the stimulus labels, cannot address this problem because it suffers from the exact same problem. Essentially, what the classifier is presumably using is the time that has passed since the recording started.
The reason for this is presumably that the authors used non-independent data for training and testing. Assuming I understand correctly (see point 3), random sampling one half of data samples from an EEG trace are not independent data. Repeating the analysis five times – the control analysis the authors performed – is not an adequate way to address this concern. Randomly selecting samples from a time series containing slow changes (such as the slow wave activity that presumably dominates these recordings under these circumstances) will inevitably contain strong temporal correlations. See TemporalCorrelations.jpg in https://doi.org/10.6084/m9.figshare.1185723 for 2D density histograms and a correlation matrix demonstrating this.
While the revised methods section provides more detail now, it still is unclear about exactly what data were used. Conventional classification analysis report what data features (usual columns in the data matrix) and what observations (usual rows) were used. Anything could be a feature but typically this might be the different EEG channels or fMRI voxels etc. Observations are usually time points. Here I assume the authors transformed the raw samples into a different space using principal component analysis. It is not stated if the dimensionality was reduced using the eigenvalues. Either way, I assume the data samples (collected at 128 Hz) were then used as observations and the EEG channels transformed by PCA were used as features. The stimulus labels were assigned as ON or OFF for each sample. A set of 50% of samples (and labels) was then selected at random for training, and the rest was used for testing. Is this correct?
A powerful non-linear classifier can capitalise on such correlations to discriminate arbitrary labels. In my own analyses I used both an SVM with RBF as well as a k-nearest neighbour classifier, both of which produce excellent decoding of arbitrary stimulus labels (see point 1). Interestingly, linear classifiers or less powerful SVM kernels fare much worse – a clear indication that the classifier learns about the complex non-linear pattern of temporal correlations that can describe the stimulus label. This is further corroborated by the fact that when using stimulus labels that are chosen completely at random (i.e. with high temporal frequency) decoding does not work.
The authors have mostly clarified how the correlation analysis was performed. It is still left unclear, however, how the correlations for individual pairs were averaged. Was Fisher’s z-transformation used, or were the data pooled across pairs? More importantly, it is not entirely surprising that under the experimental conditions there will be some correlation between the EEG signals for different participants, especially in low frequency bands. Again, this further supports the suspicion that the classification utilizes slow frequency signals that are unrelated to the stimulus and the experimental hypothesis. In fact, a quick spot check seems to confirm this suspicion: correlating the time series separately for each channel from the Receiver in pair 1 with those from the Receiver in pair 18 reveals 131 significant (p‹0.05, Bonferroni corrected) out of 196 (14x14 channels) correlations… One could perhaps argue that this is not surprising because both these pairs had been exposed to identical stimulus protocols: one minute of initial silence and only one signal period (see point 6). However, it certainly argues strongly against the notion that the decoding is any way related to the mental connection between the particular Sender and Receiver in a given pair because it clearly works between Receivers in different pairs! However, to further control for this possibility I repeated the same analysis but now comparing the Receiver from pair 1 to the Receiver from pair 15. This pair was exposed to a different stimulus paradigm (2 minutes of initial silence and a longer paradigm with three signal periods). I only used the initial 3 minutes for the correlation analysis. Therefore, both recordings would have been exposed to only one signal period but at different times (at 1 min and 2 min for pair 1 and 15, respectively). Even though the stimulus protocol was completely different the time courses for all the channels are highly correlated and 137 out of 196 correlations are significant. Considering that I used the raw data for this analysis it should not surprise anyone that extracting power from different frequency bands in short time windows will also reveal significant correlations. Crucially, it demonstrates that correlations between Sender and Receiver are artifactual and trivial.
The authors argue in their response and the revision that predictive strategies were unlikely. After having performed these additional analyses I am inclined to agree. The excellent decoding almost certainly has nothing to do with expectation or imagery effects and it is irrelevant whether participants could guess the temporal design of the experiment. Rather, the results are almost entirely an artefact of the analysis. However, this does not mean that predictability is not an issue. The figure StimulusTimecourses.jpg in https://doi.org/10.6084/m9.figshare.1185723 plots the stimulus time courses for all 20 pairs as can be extracted from the newly uploaded data. This confirms what I wrote in my previous review, in fact, with the corrected data sets the problem with predictability is even greater. Out of the 20 pairs, 13 started with 1 min of initial silence. The remaining 7 had 2 minutes of initial silence. Most of the stimulus paradigms are therefore perfectly aligned and thus highly correlated. This also proves incorrect the statement that initial silence periods were 1, 2, or 3 minutes. No pair had 3 min of initial silence. It would therefore have been very easy for any given Receiver to correctly guess the protocol. It should be clear that this is far from optimal for testing such an unorthodox hypothesis. Any future experiments should employ more randomization to decrease predictability. Even if this wasn’t the underlying cause of the present results, this is simply not great experimental design.
The authors now acknowledge in their response that all the participants were authors. They say that this is also acknowledged in the methods section, but I did not see any statement about that in the revised manuscript. As before, I also find it highly questionable to include only authors in an experiment of this kind. It is not sufficient to claim that Receivers weren’t guessing their stimulus protocol. While I am giving the authors (and thus the participants) the benefit of the doubt that they actually believe they weren’t guessing/predicting the stimulus protocols, this does not rule out that they did. It may in fact be possible to make such predictions subconsciously (Now, if you ask me, this is an interesting scientific question someone should do an experiment on!). The fact familiar with the protocol may help that. Any future experiments should take steps to prevent this.
I do not follow the explanation for the binomial test the authors used. Based on the excessive Bayes Factor of 390,625 it is clear that the authors assumed a chance level of 50% on their binomial test. Because the design is not balanced, this is not correct.
In general, the Bayes Factor and the extremely high decoding accuracy should have given the authors reason to start. Considering the unusual hypothesis did the authors not at any point wonder if these results aren’t just far too good to be true? Decoding mental states from brain activity is typically extremely noisy and hardly affords accuracies at the level seen here. Extremely accurate decoding and Bayes Factors in the hundreds of thousands should be a tell-tale sign to check that there isn’t an analytical flaw that makes the result entirely trivial. I believe this is what happened here and thus I think this experiment serves as a very good demonstration for the pitfalls of applying such analysis without sanity checks. In order to make claims like this, the experimental design must contain control conditions that can rule out these problems. Presumably, recordings without any Sender, and maybe even when the “Receiver” is aware of this fact, should produce very similar results.

Based on all these factors, it is impossible for me to approve this manuscript. I should however state that it is laudable that the authors chose to make all the raw data of their experiment publicly available. Without this it would have impossible for me to carry out the additional analyses, and thus the most fundamental problem in the analysis would have remained unknown. I respect the authors’ patience and professionalism in dealing with what I can only assume is a rather harsh review experience. I am honoured by the request for an adversarial collaboration. I do not rule out such efforts at some point in the future. However, for all of the reasons outlined in this and my previous review, I do not think the time is right for this experiment to proceed to this stage. Fundamental analytical flaws and weaknesses in the design should be ruled out first. An adversarial collaboration only really makes sense to me for paradigms were we can be confident that mundane or trivial factors have been excluded.

View Article & Reports

Show less

Positive feedback with specific examples

Gregg Roman says:

This manuscript does an excellent job demonstrating significant strain differences in Burdian's paradigm. Since each Drosophila lab has their own wild type (usually Canton-S) isolate, this issue of strain differences is actually a very important one for between lab reproducibility. This work is a good reminder for all geneticists to pay attention to the population effects in the background controls, and presumably the mutant lines we are comparing.

Read full report

I was very pleased to see the within-isolate behavior was consistent in replicate experiments one year apart. The authors further argue that the between-isolate differences in behavior arise from a Founder's effect, at least in the differences in locomotor behavior between the Paris lines CS_TP and CS_JC. I believe this is a very reasonable and testable hypothesis. It predicts that genetic variability for these traits exist within the populations. It should now be possible to perform selection experiments from the original CS_TP population to replicate the founding event and estimate the heritability of these traits.

Two other things that I liked about this manuscript are the ability to adjust parameters in figure 3, and our ability to download the raw data. After reading the manuscript, I was a little disappointed that the performance of the five strains in each 12 behavioral variables weren't broken down individually in a table or figure. I thought this may help us readers understand what the principle components were representing. The authors have made this data readily accessible in a downloadable spreadsheet.

View Article & Reports

Show less

Concise review with suggestions for a few minor revisions

Gerald Watts says:

This is an exceptionally good review and balanced assessment of the status of CETP inhibitors and ASCVD from a world authority in the field. The article highlights important data that might have been overlooked when promulgating the clinical value of CETPIs and related trials.

Read full report

Only 2 areas need revision:

Page 3, para 2: the notion that these data from Papp et al. convey is critical and the message needs an explicit sentence or two at end of paragraph.
Page 4, Conclusion: the assertion concerning the ethics of the two Phase 3 clinical trials needs toning down. Perhaps rephrase to indicate that the value and sense of doing these trials is open to question, with attendant ethical implications, or softer wording to that effect.

View Article & Reports

Show less

Clear report emphasizes relevance of the article

Jasper Rine says:

The Wiley et al. manuscript describes a beautiful synthesis of contemporary genetic approaches to, with astonishing efficiency, identify lead compounds for therapeutic approaches to a serious human disease. I believe the importance of this paper stems from the applicability of the approach to the several thousand of rare human disease genes that Next-Gen sequencing will uncover in the next few years and the challenge we will have in figuring out the function of these genes and their resulting defects. This work presents a paradigm that can be broadly and usefully applied.

Read full report

In detail, the authors begin with gene responsible for X-linked spinal muscular atrophy and express both the wild-type version of that human gene as well as a mutant form of that gene in S. pombe. The conceptual leap here is that progress in genetics is driven by phenotype, and this approach involving a yeast with no spine or muscles to atrophy is nevertheless and N-dimensional detector of phenotype.

The study is not without a small measure of luck in that expression of the wild-type UBA1 gene caused a slow growth phenotype which the mutant did not. Hence there was something in S. pombe that could feel the impact of this protein. Given this phenotype, the authors then went to work and using the power of the synthetic genetic array approach pioneered by Boone and colleagues made a systematic set of double mutants combining the human expressed UBA1 gene with knockout alleles of a plurality of S. pombe genes. They found well over a hundred mutations that either enhanced or suppressed the growth defect of the cells expressing UBI1. Most of these have human orthologs. My hunch is that many human genes expressed in yeast will have some comparably exploitable phenotype, and time will tell.

Building on the interaction networks of S. pombe genes already established, augmenting these networks by the protein interaction networks from yeast and from human proteome studies involving these genes, and from the structure of the emerging networks, the authors deduced that an E3 ligase modulated UBA1 and made the leap that it therefore might also impact X-linked Spinal Muscular Atrophy.

Here, the awesome power of the model organism community comes into the picture as there is a zebrafish model of spinal muscular atrophy. The principle of phenologs articulated by the Marcotte group inspire the recognition of the transitive logic of how phenotypes in one organism relate to phenotypes in another. With this zebrafish model, they were able to confirm that an inhibitor of E3 ligases and of the Nedd8-E1 activating suppressed the motor axon anomalies, as predicted by the effect of mutations in S. pombe on the phenotypes of the UBA1 overexpression.

I believe this is an important paper to teach in intro graduate courses as it illustrates beautifully how important it is to know about and embrace the many new sources of systematic genetic information and apply them broadly.

View Article & Reports

Show less

Detailed feedback about statistics in a critical review

Michael McCarthy says:

This paper by Amrhein et al. criticizes a paper by Bradley Efron that discusses Bayesian statistics (Efron, 2013a), focusing on a particular example that was also discussed in Efron (2013b). The example concerns a woman who is carrying twins, both male (as determined by sonogram and we ignore the possibility that gender has been observed incorrectly). The parents-to-be ask Efron to tell them the probability that the twins are identical.

Read full report

This is my first open review, so I'm not sure of the protocol. But given that there appears to be errors in both Efron (2013b) and the paper under review, I am sorry to say that my review might actually be longer than the article by Efron (2013a), the primary focus of the critique, and the critique itself. I apologize in advance for this. To start, I will outline the problem being discussed for the sake of readers.

This problem has various parameters of interest. The primary parameter is the genetic composition of the twins in the mother’s womb. Are they identical (which I describe as the state x = 1) or fraternal twins (x = 0)? Let y be the data, with y = 1 to indicate the twins are the same gender. Finally, we wish to obtain Pr(x = 1 | y = 1), the probability the twins are identical given they are the same gender1. Bayes’ rule gives us an expression for this:

Pr(x = 1 | y = 1) = Pr(x=1) Pr(y = 1 | x = 1) / {Pr(x=1) Pr(y = 1 | x = 1) + Pr(x=0) Pr(y = 1 | x = 0)}

Now we know that Pr(y = 1 | x = 1) = 1; twins must be the same gender if they are identical. Further, Pr(y = 1 | x = 0) = 1/2; if twins are not identical, the probability of them being the same gender is 1/2.

Finally, Pr(x = 1) is the prior probability that the twins are identical. The bone of contention in the Efron papers and the critique by Amrhein et al. revolves around how this prior is treated. One can think of Pr(x = 1) as the population-level proportion of twins that are identical for a mother like the one being considered.

However, if we ignore other forms of twins that are extremely rare (equivalent to ignoring coins finishing on their edges when flipping them), one incontrovertible fact is that Pr(x = 0) = 1 − Pr(x = 1); the probability that the twins are fraternal is the complement of the probability that they are identical.

The above values and expressions for Pr(y = 1 | x = 1), Pr(y = 1 | x = 0), and Pr(x = 0) leads to a simpler expression for the probability that we seek ‐ the probability that the twins are identical given they have the same gender:

Pr(x = 1 | y = 1) = 2 Pr(x=1) / [1 + Pr(x=1)] (1)

We see that the answer depends on the prior probability that the twins are identical, Pr(x=1). The paper by Amrhein et al. points out that this is a mathematical fact. For example, if identical twins were impossible (Pr(x = 1) = 0), then Pr(x = 1| y = 1) = 0. Similarly, if all twins were identical (Pr(x = 1) = 1), then Pr(x = 1| y = 1) = 1. The “true” prior lies somewhere in between. Apparently, the doctor knows that one third of twins are identical2. Therefore, if we assume Pr(x = 1) = 1/3, then Pr(x = 1| y = 1) = 1/2.

Now, what would happen if we didn't have the doctor's knowledge? Laplace's “Principle of Insufficient Reason” would suggest that we give equal prior probability to all possibilities, so Pr(x = 1) = 1/2 and Pr(x = 1| y = 1) = 2/3, an answer different from 1/2 that was obtained when using the doctor's prior of 1/3.

Efron(2013a) highlights this sensitivity to the prior, representing someone who defines an uninformative prior as a “violator”, with Laplace as the “prime violator”. In contrast, Amrhein et al. correctly points out that the difference in the posterior probabilities is merely a consequence of mathematical logic. No one is violating logic – they are merely expressing ignorance by specifying equal probabilities to all states of nature. Whether this is philosophically valid is debatable (Colyvan 2008), but weight to that question, and it is well beyond the scope of this review. But setting Pr(x = 1) = 1/2 is not a violation; it is merely an assumption with consequences (and one that in hindsight might be incorrect2).

Alternatively, if we don't know Pr(x = 1), we could describe that probability by its own probability distribution. Now the problem has two aspects that are uncertain. We don’t know the true state x, and we don’t know the prior (except in the case where we use the doctor’s knowledge that Pr(x = 1) = 1/3). Uncertainty in the state of x refers to uncertainty about this particular set of twins. In contrast, uncertainty in Pr(x = 1) reflects uncertainty in the population-level frequency of identical twins. A key point is that the state of one particular set of twins is a different parameter from the frequency of occurrence of identical twins in the population.

Without knowledge about Pr(x = 1), we might use Pr(x = 1) ~ dunif(0, 1), which is consistent with Laplace. Alternatively, Efron (2013b) notes another alternative for an uninformative prior: Pr(x = 1) ~ dbeta(0.5, 0.5), which is the Jeffreys prior for a probability.

Here I disagree with Amrhein et al.; I think they are confusing the two uncertain parameters. Amrhein et al. state:

“We argue that this example is not only flawed, but useless in illustrating Bayesian data analysis because it does not rely on any data. Although there is one data point (a couple is due to be parents of twin boys, and the twins are fraternal), Efron does not use it to update prior knowledge. Instead, Efron combines different pieces of expert knowledge from the doctor and genetics using Bayes’ theorem.”

This claim might be correct when describing uncertainty in the population-level frequency of identical twins. The data about the twin boys is not useful by itself for this purpose – they are a biased sample (the data have come to light because their gender is the same; they are not a random sample of twins). Further, a sample of size one, especially if biased, is not a firm basis for inference about a population parameter. While the data are biased, the claim by Amrheim et al. that there are no data is incorrect.

However, the data point (the twins have the same gender) is entirely relevant to the question about the state of this particular set of twins. And it does update the prior. This updating of the prior is given by equation (1) above. The doctor’s prior probability that the twins are identical (1/3) becomes the posterior probability (1/2) when using information that the twins are the same gender. The prior is clearly updated with Pr(x = 1| y = 1) ≠ Pr(x = 1) in all but trivial cases; Amrheim et al.’s statement that I quoted above is incorrect in this regard.

This possible confusion between uncertainty about these twins and uncertainty about the population level frequency of identical twins is further suggested by Amrhein et al.’s statements:

“Second, for the uninformative prior, Efron mentions erroneously that he used a uniform distribution between zero and one, which is clearly different from the value of 0.5 that was used. Third, we find it at least debatable whether a prior can be called an uninformative prior if it has a fixed value of 0.5 given without any measurement of uncertainty.”

Note, if the prior for Pr(x = 1) is specified as 0.5, or dunif(0,1), or dbeta(0.5, 0.5), the posterior probability that these twins are identical is 2/3 in all cases. Efron (2013b) says the different priors lead to different results, but this result is incorrect, and the correct answer (2/3) is given in Efron (2013a)3. Nevertheless, a prior that specifies Pr(x = 1) = 0.5 does indicate uncertainty about whether this particular set of twins is identical (but certainty in the population level frequency of twins). And Efron’s (2013a) result is consistent with Pr(x = 1) having a uniform prior. Therefore, both claims in the quote above are incorrect.

It is probably easiest to show the (lack of) influence of the prior using MCMC sampling. Here is WinBUGS code for the case using Pr(x = 1) = 0.5.

model
{
  pr_ident_twins <- 0.5   # prior probability that the twins are identical
  x ~ dbern(pr_ident_twins)   # are they identical? If so, x = 1, and 0 otherwise
 
  pr_same_gender <- x + (1-x)*0.5   # the probability that the twins have the same gender. It equals 1 if x = 1, and 0.5 otherwise (i.e., if x = 0)
 
  same_gender <- 1  # the single data point - the twins are the same gender
  same_gender ~ dbern(pr_same_gender)   # those data arise as a Bernoulli sample with probability pr_same_gender
}

Running this model in WinBUGS shows that the posterior mean of x is 2/3; this is the posterior probability that x = 1.

Instead of using pr_ident_twins <- 0.5, we could set this probability as being uncertain and define pr_ident_twins ~ dunif(0,1), or pr_ident_twins ~ dbeta(0.5,0.5). In either case, the posterior mean value of x remains 2/3 (contrary to Efron 2013b, but in accord with the correction in Efron 2013a).

Note, however, that the value of the population level parameter pr_ident_twins is different in all three cases. In the first it remains unchanged at 1/2 where it was set. In the case where the prior distribution for pr_ident_twins is uniform or beta, the posterior distributions remain broad, but they differ depending on the prior (as they should – different priors lead to different posteriors 4). However, given the biased sample size of 1, the posterior distribution for this particular parameter is likely to be misleading as an estimate of the population-level frequency of twins.

So why doesn’t the choice of prior influence the posterior probability that these twins are identical? Well, for these three priors, the prior probability that any single set of twins is identical is 1/2 (this is essentially the mean of the prior distributions in these three cases).

If, instead, we set the prior as dbeta(1,2), which has a mean of 1/3, then the posterior probability that these twins are identical is 1/2. This is the same result as if we had set Pr(x = 1) = 1/3. In both these cases (choosing dbeta(1,2) or 1/3), the prior probability that a single set of twins is identical is 1/3, so the posterior is the same (1/2) given the data (the twins have the same gender).

Further, Amrhein et al. also seem to misunderstand the data. They note:

“Although there is one data point (a couple is due to be parents of twin boys, and the twins are fraternal)...”

This is incorrect. The parents simply know that the twins are both male. Whether they are fraternal is unknown (fraternal twins being the complement of identical twins) – that is the question the parents are asking. This error of interpretation makes the calculations in Box 1 and subsequent comments irrelevant.

Box 1 also implies Amrhein et al. are using the data to estimate the population frequency of identical twins rather than the state of this particular set of twins. This is different from the aim of Efron (2013a) and the stated question.

Efron suggests that Bayesian calculations should be checked with frequentist methods when priors are uncertain. However, this is a good example where this cannot be done easily, and Amrhein et al. are correct to point this out. In this case, we are interested in the probability that the hypothesis is true given the data (an inverse probability), not the probabilities that the observed data would be generated given particular hypotheses (frequentist probabilities). If one wants the inverse probability (the probability the twins are identical given they are the same gender), then Bayesian methods (andtherefore a prior) are required. A logical answer simply requires that the prior is constructed logically. Whether that answer is “correct” will be, in most cases, only known in hindsight.

However, one possible way to analyse this example using frequentist methods would be to assess the likelihood of obtaining the data for each of the two hypothesis (the twins are identical or fraternal). The likelihood of the twins having the same gender under the hypothesis that they are identical is 1. The likelihood of the twins having the same gender under the hypothesis that they are fraternal is 0.5. Therefore, the weight of evidence in favour of identical twins is twice that of fraternal twins. Scaling these weights so they sum to one (Burnham and Anderson 2002), gives a weight of 2/3 for identical twins and 1/3 for fraternal twins. These scaled weights have the same numerical values as the posterior probabilities based on either a Laplace or Jeffreys prior. Thus, one might argue that the weight of evidence for each hypothesis when using frequentist methods is equivalent to the posterior probabilities derived from an uninformative prior. So, as a final aside in reference to Efron (2013a), if we are being “violators” when using a uniform prior, are we also being “violators” when using frequentist methods to weigh evidence? Regardless of the answer to this rhetorical question, “checking” the results with frequentist methods doesn’t give any more insight than using uninformative priors (in this case). However, this analysis shows that the question can be analysed using frequentist methods; the single data point is not a problem for this. The claim in Armhein et al. that a frequentist analyis "is impossible because there is only one data point, and frequentist methods generally cannot handle such situations" is not supported by this example.

In summary, the comment by Amrhein et al. raises some interesting points that seem worth discussing, but it makes important errors in analysis and interpretation, and misrepresents the results of Efron (2013a). This means the current version should not be approved.

References

Burnham, K.P. & D.R. Anderson. 2002. Model Selection and Multi-model Inference: a Practical Information-theoretic Approach. Springer-Verlag, New York.

Colyvan, M. 2008. Is Probability the Only Coherent Approach to Uncertainty? Risk Anal. 28: 645-652.

Efron B. (2003a) Bayes’ Theorem in the 21st Century. Science 340(6137): 1177-1178.

Efron B. (2013b) A 250-year argument: Belief, behavior, and the bootstrap. Bull Amer. Math Soc. 50: 129-146.

Footnotes

The twins are both male. However, if the twins were both female, the statistical results would be the same, so I will simply use the data that the twins are the same gender.
In reality, the frequency of twins that are identical is likely to vary depending on many factors but we will accept 1/3 for now.
Efron (2013b) reports the posterior probability for these twins being identical as “a whopping 61.4% with a flat Laplace prior” but as 2/3 in Efron (2013a). The latter (I assume 2/3 is “even more whopping”!) is the correct answer, which I confirmed via email with Professor Efron. Therefore, Efron (2013b) incorrectly claims the posterior probability is sensitive to the choice between a Jeffreys or Laplace uninformative prior.
When the data are very informative relative to the different priors, the posteriors will be similar, although not identical.

View Article & Reports

Show less

Constructive feedback on an opinion article

Mark Parsons and Peter Fox say:

I am very glad the authors wrote this essay. It is a well-written, needed, and useful summary of the current status of “data publication” from a certain perspective. The authors, however, need to be bolder and more analytical. This is an opinion piece, yet I see little opinion. A certain view is implied by the organization of the paper and the references chosen, but they could be more explicit.

Read full report

The paper would be both more compelling and useful to a broad readership if the authors moved beyond providing a simple summary of the landscape and examined why there is controversy in some areas and then use the evidence they have compiled to suggest a path forward. They need to be more forthright in saying what data publication means to them, or what parts of it they do not deal with. Are they satisfied with the Lawrence et al. definition? Do they accept the critique of Parsons and Fox? What is the scope of their essay?

The authors take a rather narrow view of data publication, which I think hinders their analyses. They describe three types of (digital) data publication: Data as a supplement to an article; data as the subject of a paper; and data independent of a paper. The first two types are relatively new and they represent very little of the data actually being published or released today. The last category, which is essentially an “other” category, is rich in its complexity and encompasses the vast majority of data released. I was disappointed that the examples of this type were only the most bare-bones (Zenodo and Figshare). I think a deeper examination of this third category and its complexity would help the authors better characterize the current landscape and suggest paths forward.

Some questions the authors might consider: Are these really the only three models in consideration or does the publication model overstate a consensus around a certain type of data publication? Why are there different models and which approach is better for different situations? Do they have different business models or imply different social contracts? Might it also be worthy of typing “publishers” instead of “publications”? For example, do domain repositories vs. institutional repositories vs. publishers address the issues differently? Are these models sustaining models or just something to get us through the next 5-10 years while we really figure it out?

I think this oversimplification inhibited some deeper analysis in other areas as well. I would like to see more examination of the validation requirement beyond the lens of peer review, and I would like a deeper examination of incentives and credit beyond citation.

I thought the validation section of the paper was very relevant, but somewhat light. I like the choice of the term validation as more accurate than “quality” and it fits quite well with Callaghan’s useful distinction between technical and scientific review, but I think the authors overemphasize the peer-review style approach. The authors rightly argue that “peer-review” is where the publication metaphor leads us, but it may be a false path. They overstate some difficulties of peer-review (No-one looks at every data value? No, they use statistics, visualization, and other techniques.) while not fully considering who is responsible for what. We need a closer examination of different roles and who are appropriate validators (not necessarily conventional peers). The narrowly defined models of data publication may easily allow for a conventional peer-review process, but it is much more complex in the real-world “other” category. The authors discuss some of this in what they call “independent data validation,” but they don’t draw any conclusions.

Only the simplest of research data collections are validated only by the original creators. More often there are teams working together to develop experiments, sampling protocols, algorithms, etc. There are additional teams who assess, calibrate, and revise the data as they are collected and assembled. The authors discuss some of this in their examples like the PDS and tDAR, but I wish they were more analytical and offered an opinion on the way forward. Are there emerging practices or consensus in these team-based schemes? The level of service concept illustrated by Open Context may be one such area. Would formalizing or codifying some of these processes accomplish the same as peer-review or more? What is the role of the curator or data scientist in all of this? Given the authors’s backgrounds, I was surprised this role was not emphasized more. Finally, I think it is a mistake for science review to be the main way to assess reuse value. It has been shown time and again that data end up being used effectively (and valued) in ways that original experts never envisioned or even thought valid.

The discussion of data citation was good and captured the state of the art well, but again I would have liked to see some views on a way forward. Have we solved the basic problem and are now just dealing with edge cases? Is the “just-in-time identifier” the way to go? What are the implications? Will the more basic solutions work in the interim? More critically, are we overemphasizing the role of citation to provide academic credit? I was gratified that the authors referenced the Parsons and Fox paper which questions the whole data publication metaphor, but I was surprised that they only discussed the “data as software” alternative metaphor. That is a useful metaphor, but I think the ecosystem metaphor has broader acceptance. I mention this because the authors critique the software metaphor because “using it to alter or affect the academic reward system is a tricky prospect”. Yet there is little to suggest that data publication and corresponding citation alters that system either. Indeed there is little if any evidence that data publication and citation incentivize data sharing or stewardship. As Christine Borgman suggests, we need to look more closely at who we are trying to incentivize to do what. There is no reason to assume it follows the same model as research literature publication. It may be beyond the scope of this paper to fully examine incentive structures, but it at least needs to be acknowledged that building on the current model doesn’t seem to be working.

Finally, what is the takeaway message from this essay? It ends rather abruptly with no summary, no suggested directions or immediate challenges to overcome, no call to action, no indications of things we should stop trying, and only brief mention of alternative perspectives. What do the authors want us to take away from this paper?

Overall though, this is a timely and needed essay. It is well researched and nicely written with rich metaphor. With modifications addressing the detailed comments below and better recognizing the complexity of the current data publication landscape, this will be a worthwhile review paper. With more significant modification where the authors dig deeper into the complexities and controversies and truly grapple with their implications to suggest a way forward, this could be a very influential paper. It is possible that the definitions of “publication” and “peer-review” need not be just stretched but changed or even rejected.

Detailed Comments:

The whole paper needs a quick copy edit. There are a few typos, missing words, and wrong verb tenses. Note the word “data” is a plural noun. E.g., Data are not software, nor are they literature. (NSICD, instead of NSIDC)
Page 2, para 2: “citability is addressed by assigning a PID.” This is not true, as the authors discuss on page 4, para 4. Indeed, page 4, para 4 seems to contradict itself. Citation is more than a locator/identifier.
In the discussion of “Data independent of any paper” it is worth noting that there may often be linkages between these data and myriad papers. Indeed a looser concept of a data paper has existed for some time, where researchers request a citation to a paper even though it is not the data nor fully describes the data (e.g the CRU temp records)
Page 4, para 1: I’m not sure it’s entirely true that published data cannot involve requesting permission. In past work with Indigenous knowledge holders, they were willing to publish summary data and then provide the details when satisfied the use was appropriate and not exploitive. I think those data were “published” as best they could be. A nit, perhaps, but it highlights that there are few if any hard and fast rules about data publication.
Page 4, para 2: You may also want to mention the WDS certification effort, which is combining with the DSA via an RDA Working Group:
Page 4, para 2: The joint declaration of data citation principles involved many more organizations than Force11, CODATA, and DCC. Please credit them all (maybe in a footnote). The glory of the effort was that it was truly a joint effort across many groups. There is no leader. Force11 was primarily a convener.
Page 4, para 6: The deep citation approach recommended by ESIP is not to just to list variables or a range of data. It is to identify a “structural index” for the data and to use this to reference subsets. In Earth science this structural index is often space and time, but many other indices are possible--location in a gene sequence, file type, variable, bandwidth, viewing angle, etc. It is not just for “straightforward” data sets.
Page 5, para 5: I take issue with the statement that few repositories provide scientific review. I can think of a couple dozen that do just off the top of my head, and I bet most domain repositories have some level of science review. The “scientists” may not always be in house, but the repository is a team facilitator. See my general comments.
Page 5, para 10: The PDS system is only unusual in that it is well documented and advertised. As mentioned, this team style approach is actually fairly common.
Page 6, para 3: Parsons and Fox don’t just argue that the data publication metaphor is limiting. They also say it is misleading. That should be acknowledged at least, if not actively grappled with.

View Article & Reports

Show less

Response to authors who did not make sufficient changes to article

Leo McHugh says:

Artifact removal: Unfortunately the authors have not updated the paper with a 2x2 table showing guns and smiles by removed data points. This could dispel criticism that an asymmetrical expectation bias that has been shown to exist in similar experiments is not driving a bias leading to inappropriate conclusions.

Read full report

Artifact removal: Unfortunately the authors have not updated the paper with a 2x2 table showing guns and smiles by removed data points. This could dispel criticism that an asymmetrical expectation bias that has been shown to exist in similar experiments is not driving a bias leading to inappropriate conclusions. This is my strongest criticism of the paper and should be easily addressed as per my previous review comment. The fact that this simple data presentation was not performed to remove a clear potential source of spurious results is disappointing.
The authors have added 95% CIs to figures S1 and S2. This clarifies the scope for expectation bias in these data. The addition of error bars permits the authors’ assumption of a linear trend, indicating that the effect of sequences of either guns or smiles may not skew results. Equally, there could be either a downwards or upwards trend fitting within the confidence intervals that could be indicative of a cognitive bias that may violate the assumptions of the authors, leading to spurious results. One way to remove these doubts could be to stratify the analyses by the length of sequences of identical symbols. If the results hold up in each of the strata, this potential bias could be shown to not be present in the data. If the bias is strong, particularly in longer runs, this could indicate that the positive result was due to small numbers of longer identical runs combined with a cognitive bias rather than an ability to predict future events.

View Article & Reports

Show less

Suggestions for further improvement of an approved article

Gavin Simpson says:

Chamberlain and Szöcs present the taxize R package, a set of functions that provides interfaces to several web tools and databases, and simplifies the process of checking, updating, correcting and manipulating taxon names for researchers working with ecological/biological data. A key feature that is repeated throughout is the need for reproducibility of science workflows and taxize provides a means to achieve this within the R software ecosystem for taxonomic search.

Read full report

The manuscript is well-written and nicely presented, with a good balance of descriptive text and discourse and practical illustration of package usage. A number of examples illustrate the scope of the package, something that is fully expanded upon in the two appendices, which are a welcome addition to the paper.

As to the package, I am not overly fond of long function names; the authors should consider dropping the data source abbreviations from the function names in a future update/revision of the package. Likewise there is some inconsistency in the naming conventions used. For example there is the ’tpl_search()’ function to search The Plant List, but the equivalent function to search uBio is ’ubio_namebank()’. Whilst this may reflect specific aspects of terminology in use at the respective data stores, it does not help the user gain familiarity with the package by having them remember inconsistent function names.

One advantage of taxize is that it draws together a rich selection of data stores to query. A further suggestion for a future update would be to add generic function names, that apply to a database connection/information object. The latter would describe the resource the user wants to search and any other required information, such as the API key, etc., for example:

foo <- taxizeDB(what = "uBio", key = "1646546164694")

The user function to search would then be ’search(foo, "Abies")’. Similar generically named functions would provide the primary user-interface, thus promoting a more consistent toolbox at the R level. This will become increasingly relevant as the scope of taxize increases through the addition of new data stores that the package can access.

In terms of presentation in the paper, I really don’t like the way the R code inputs merge with the R outputs. I know the author of Knitr doesn’t like the demarcation of output being polluted by the R prompt, but I do find it difficult parsing the inputs/outputs you show because often there is no space between them and users not familiar with R will have greater difficulties than I. Consider adding in more conventional indications of R outputs, or physically separate input from output by breaking up the chunks of code to have whitespace between the grey-background chunks. Related, in one location I noticed something amiss with the layout; in the first code block at the top of page 5, the printed output looks wrong here. I would expect the attributes to print on their own line and the data in the attribute to also be on its own separate line.

Note also, the inconsistency in the naming of the output object columns. For example, in the two code chunks shown in column 1 of page 4, the first block has an object printed with column names ’matched_name’ and ’data_source_title’, whilst camelCase is used in the outputs shown in the second block. As the package is revised and developed, consider this and other aspects of providing a consistent presentation to the user.

I was a little confused about the example in the section Resolve Taxonomic Names on page 4. Should the taxon name be “Helianthus annuus” or “Helianthus annus”? In the ‘mynames’ definition you include ‘Helianthus annuus’ in the character vector but the output shown suggests that the submitted name was ‘Helianthus annus’ (1 “u”) in rows with rownames 9 and 10 in the output shown.

Other than that there were the following minor observations:

Abstract: replace “easy” with “simple” in “...fashion that’s easy...”, and move the details about availability and the URI to the end of the sentence.
Page 2, Column 1, Paragraph 2: You have “In addition, there is no one authoritative taxonomic names source...”, which is a little clumsy to read. How about “In addition, there is no one authoritative source of taxonomic names... ”?
Pg 2, C1, P2-3: The abbreviated data sources are presented first (in paragraph 2) and subsequently defined (in para 3). Restructure this so that the abbreviated forms are explained upon first usage.
Pg 2, C2, P2: Most R packages are “in development” so I would drop the qualifier and reword the opening sentence of the paragraph.
Pg 2, C2, P6: Change “and more can easily be added” to “and more can be easily added” seems to flow better?
Pg 5, paragraph above Figure 1: You refer to converting the object to an **ape** *phylo* object and then repeat essentially the same information in the next sentence. Remove the repetition.
Pg 6, C1: The header may be better as “Which taxa are children of the taxon of interest”.
Pg 6: In the section “IUCN status”, the term “we” is used to refer to both the authors and the user. This is confusing. Reserve “we” for reference to the authors and use something else (“a user” perhaps) for the other instances. Check this throughout the entire manuscript.
Pg 6, C2: in the paragraph immediately below the ‘grep()’ for “RAG1”, two consecutive sentences begin with “However”.
Pg 7: The first sentence of “Aggregating data....” reads “In biology, one can asks questions...”. It should be “one asks” or “one can ask”.
Pg 7, Conclusions: The first sentence reads “information is increasingly sought out by biologists”. I would drop “out” as “sought” is sufficient on its own.
Appendices: Should the two figures in the Appendices have a different reference to differentiate them from Figure 1 in the main body of the paper? As it stands, the paper has two Figure 1s, one on page 5 and a second on page 12 in the Appendix.
On Appendix Figure 2: The individual points are a little large. Consider reducing the plotting character size. I appreciate the effect you were going for with the transparency indicating density of observation through overplotting, but the effect is weakened by the size of the individual points.
Should the phylogenetic trees have some scale to them? I presume the height of the stems is an indication of phylogenetic distance but the figure is hard to calibrate without an associated scale. A quick look at Paradis (2012) Analysis of Phylogenetics and Evolution with R would suggest however that a scale is not consistently applied to these trees. I am happy to be guided by the authors as they will be more familiar with the conventions than I.

View Article & Reports

Show less

Peer review of a review article

Gustavo Gutierrez Gonzalez says:

Read full report

Hydbring and Badalian-Very summarize in this review, the current status in the potential development of clinical applications based on miRNAs’ biology. The article gives an interesting historical and scientific perspective on a field that has only recently boomed; focusing mostly on the two main products in the pipeline of several biotech companies (in Europe and USA) which work with miRNAs-based agents, disease diagnostics and therapeutics. Interestingly, not only the specific agents that are being produced are mentioned, but also clever insights in the important cellular pathways regulated by key miRNAs are briefly discussed.

Minor points to consider in subsequent versions:

Page 2; paragraph ‘Genomic location and transcription of microRNAs’: the concept of miRNA clusters and precursors could be a bit better explained.
Page 2; paragraph ‘Genomic location and transcription of microRNAs’: when discussing the paper by the laboratory of Richard Young (reference 16); I think it is important to mention that that particular study refers to stem cells.
Page 2; paragraph ‘Processing of microRNAs’: “Argonate” should be replaced by “Argonaute”.
Page 3; paragraph ‘MicroRNAs in disease diagnostics’: are miR-15a and 16-1 two different miRNAs? I suggest mentioning them as: miR-15a and miR-16-1 and not using a slash sign (/) between them.
Page 4; paragraph ‘Circulating microRNAs’: I am a bit bothered by the description of multiple sclerosis (MS) only as an autoimmune disease. Without being an expert in the field, I believe that there are other hypotheses related to the etiology of MS.
Page 5; paragraph ‘Clinical microRNA diagnostics’: Does ‘hsa’ in hsa-miR-205 mean something?
Page 5; paragraph ‘Clinical microRNA diagnostics’: the authors mention the company Asuragen, Austin, TX, USA but they do not really say anything about their products. I suggest to either remove the reference to that company or to include their current pipeline efforts.
Page 6; paragraph ‘MicroRNAs in therapeutics’: in the first paragraph the authors suggest that miRNAs-based therapeutics should be able to be applied with “minimal side-effects”. Since one miRNA can affect a whole gene program, I found this a bit counterintuitive; I was wondering if any data has been published to support that statement. Also, in the same paragraph, the authors compare miRNAs to protein inhibitors, which are described as more specific and/or selective. I think there are now good indications to think that protein inhibitors are not always that specific and/or selective and that such a property actually could be important for their evidenced therapeutic effects.
Page 6; paragraph ‘MicroRNAs in therapeutics’: I think the concept of “antagomir” is an important one and could be better highlighted in the text.
Throughout the text (pages 3, 5, 6, and 7): I am a bit bothered by separating the word “miRNA” or “miRNAs” at the end of a sentence in the following way: “miR-NA” or “miR-NAs”. It is a bit confusing considering the particular nomenclature used for miRNAs. That was probably done during the formatting and editing step of the paper.
I was wondering if the authors could develop a bit more the general concept that seems to indicate that in disease (and in particular in cancer) the expression and levels of miRNAs are in general downregulated. Maybe some papers have been published about this phenomenon?

View Article & Reports

Show less

Review of an article describing negative results

Christine Mummery says:

The authors describe their attempt to reproduce a study in which it was claimed that mild acid treatment was sufficient to reprogramme postnatal splenocytes from a mouse expressing GFP in the oct4 locus to pluripotent stem cells. The authors followed a protocol that has recently become available as a technical update of the original publication.

Read full report

They report obtaining no pluripotent stem cells expressing GFP driven over the same time period of several days described in the original publication. They describe observation of some green fluorescence that they attributed to autofluorescence rather than GFP since it coincided with PI positive dead cells. They confirmed the absence of oct4 expression by RT-PCR and also found no evidence for Nanog or Sox2, also markers of pluripotent stem cells.

The paper appears to be an authentic attempt to reproduce the original study, although the study might have had additional value with more controls: “failure to reproduce” studies need to be particularly well controlled.

Examples that could have been valuable to include are:

For the claim of autofluorescence: the emission spectrum of the samples would likely have shown a broad spectrum not coincident with that of GFP.
The reprogramming efficiency of postnatal mouse splenocytes using more conventional methods in the hands of the authors would have been useful as a comparison. Idem the lung fibroblasts.
There are no positive control samples (conventional mESC or miPSC) in the qPCR experiments for pluripotency markers. This would have indicated the biological sensitivity of the assay.
Although perhaps a sensitive issue, it might have been helpful if the authors had been able to obtain samples of cells (or their mRNA) from the original authors for simultaneous analysis.

In summary, this is a useful study as it is citable and confirms previous blog reports, but it could have been improved by more controls.

View Article & Reports

Show less

Comments on a case report

Atanaska Elenkova says:

The article is well written, treats an actual problem (the risk of development of valvulopathy after long-term cabergoline treatment in patients with macroprolactinoma) and provides evidence about the reversibility of valvular changes after timely discontinuation of DA treatment.

Read full report

Title and abstract: The title is appropriate for the content of the article. The abstract is concise and accurately summarizes the essential information of the paper although it would be better if the authors define more precisely the anatomic specificity of valvulopathy – mild mitral regurgitation.

Case report: The clinical case presentation is comprehensive and detailed but there are some minor points that should be clarified:

Please clarify the prolactin levels at diagnosis. In the Presentation section (line 3) “At presentation, prolactin level was found to be greater than 1000 ng/ml on diluted testing” but in the section describing the laboratory evaluation at diagnosis (line 7) “Prolactin level was 55 ng/ml”. Was the difference due to so called “hook effect”?
Figure 1: In the text the follow-up MR imaging is indicated to be “after 10 months of cabergoline treatment”. However, the figures 1C and 1D represent 2 years post-treatment MR images. Please clarify.
Figure 2: Echocardiograms 2A and 2B are defined as baseline but actually they correspond to the follow-up echocardiographic assessment at the 4th year of cabergoline treatment. Did the patient undergo a baseline (prior to dopamine agonist treatment) echocardiographic evaluation? If he did not, it should be mentioned as study limitation in the Discussion section.
The mitral valve thickness was mentioned to be normal. Did the echographic examination visualize increased echogenicity (hyperechogenicity) of the mitral cusps?
How could you explain the decrease of LV ejection fraction (from 60-65% to 50-55%) after switching from cabergoline to bromocriptine treatment and respectively its increase to 62% after doubling the bromocriptine daily dose? Was LV function estimated always by the same method during the follow-up?
Final paragraph: Authors conclude that early discontinuation and management with bromocriptine may be effective in reversing cardiac valvular dysfunction. Even though, regular echocardiographic follow up should be considered in patients who are expected to be on long-term high dose treatment with bromocriptine regarding its partial 5-HT2b agonist activity.

View Article & Reports

Show less

Discussing sample size of a science communication study

Sarah Davies says:

Read full report

This is an interesting topic: as the authors note, the way that communicators imagine their audiences will shape their output in significant ways. And I enjoyed what clearly has the potential to be a very rich data set. But I have some reservations about the adequacy of that data set, as it currently stands, given the claims the authors make; the relevance of the analytical framework(s) they draw upon; and the extent to which their analysis has offered significant new insights ‐ by which I mean, I would be keen to see the authors push their discussion further. My suggestions are essentially that they extend the data set they are working with to ensure that their analysis is both rigorous and generalisable, an re-consider the analytical frame they use. I will make some more concrete comments below.

With regard to the data: my feeling is that 14 interviews is a rather slim data set, and that this is heightened by the fact that they were all carried out in a single location, and recruited via snowball sampling and personal contacts. What efforts have the authors made to ensure that they are not speaking to a single, small, sub-community in the much wider category of science communicators? ‐ a case study, if you like, of a particular group of science communicators in North Carolina? In addition, though the authors reference grounded theory as a method for analysis, I got little sense of the data reaching saturation. The reliance on one-off quotes, and on the stories and interests of particular individuals, left me unsure as to how representative interview extracts were. I would therefore recommend either that the data set is extended by carrying out more interviews, in a wider variety of locations (e.g. other sites in the US), or that it is redeveloped as a case study of a particular local professional community. (Which would open up some fascinating questions ‐ how many of these people know each other? What spaces, online or offline, do they interact in, and do they share knowledge, for instance about their audiences? Are there certain touchstone events or publics they communally make reference to?)

As a more minor point with regard to the data set and what the authors want it to do, there were some inconsistencies as to how the study was framed. On p.2 they variously describe the purpose as to “understand the experiences and perspectives of science communicators” and the goals as identifying “the basic interests and value orientations attributed to lay audiences by science communicators”. Later, on p.5, they note that the “research is inductive and seeks to build theory rather than generalizable claims”, while in the Discussion they talk again about having identified communicators‘ “personal motivations” (p.12). There are a number of questions left hanging: is the purpose to understand communicator experiences ‐ in which case why focus on perceptions of audiences? Where is theory being built, and in what ways can this be mobilised in future work? The way that the study is framed and argued as a whole needs, I would suggest, to be clarified.

Relatedly, my sense is that some of this confusion is derived from what I find a rather busy analytical framework. I was not convinced of the value of combining inductive and deductive coding: if the ‘human value typology’ the authors use is ‘universal’, then what is added by open coding? Or, alternatively, why let their open coding, and their findings from this, be constrained by an additional, rather rigid, framework? The addition of the considerable literature on news values to the mix makes the discussion more confusing again. I would suggest that the authors either make much more clear the value of combining these different approaches ‐ building new theory outlining how they relate, and can be jointly mobilised in practice ‐ or fix on one. (My preference would be to focus on the findings from the open coding ‐ but that reflects my own disciplinary biases.)

A more minor analytical point: the authors note that their interviewees come from slightly different professions, and communicate through different formats, have different levels of experience, and different educational backgrounds ‐ but as far as I can see there is no comparative analysis based on this. Were there noticeable differences in the interview talk based on these categorisations? Or was the data set too small to identify any potential contrasts or themes? A note explaining this would be useful.

My final point has reference to the potential that this data set has, particularly if it is extended and developed. I would like to encourage the authors to take their analysis further: at the moment, I was not particularly surprised by the ways in which the communicators referenced news values or imagined their audiences. But it seems to me that the analytical work is not yet complete. What does it mean that communicators imagine audience values and preferences in the way that they do ‐ who is included and excluded by these imaginations? One experiment might be to consider what ‘ideal type’ publics are created in the communicators’ talk. What are the characteristics of the audiences constructed in the interviews and ‐ presumably ‐ in the communicative products of interviewees? What would these people look like? There are also some tantalizing hints in the Discussion that are not really discussed in the Findings ‐ of, for instance, the way in which communicator’s personal motivations may combine with their perceptions of audiences to shape their products. How does this happen? These are, of course, suggestions. But my wider point is that the authors need to show more clearly what is original and useful in their findings ‐ what it is, exactly, that will be important to other scholars in the field.

I hope my comments make sense ‐ please do not hesitate to contact me if not.

View Article & Reports

Show less

Reviewing bioinformatics software as well as the article

Jordi Deu-Pons says:

This is an interesting article and piece of software. I think it contributes towards further alternatives to easily visualize high dimensionality data on the web. It’s simple and easy to embed into other web frameworks or applications.

Read full report

Minor revisions

a) About the software

CSV format. It was hard to guess the expected format. The authors need to add a syntax description of the CSV format at the help page.
Simple HTML example. It will be easy to test HeatmapViewer (HmV) if you add a simple downloadable example file with the minimum required HTML-JavaScript to set up a HmV (without all the CSV import code).
Color scale. HmV only implements a simple three point linear color scale. For me this is the major weakness of HmV. It will be very convenient that in the next HmV release the user can give as a parameter a function that manages the score to color conversion.

b) About the paper

Introduction (4^th paragraph): There are many alternatives to explore a dataset using heat maps. The author only cites two and it’s not clear if you refer to “JavaScript” or “web” alternatives. I think that you have to emphasize the strengths of HmV in comparison to other alternatives (in my opinion, one strength is that it is a good lightweight alternative to embed heat maps in a web report). Example of alternatives that I know of (but I’m sure that there are many more) are:
Predicted protein mutability landscape: The authors say: “Without using a tool such as the HeatmapViewer, we could hardly obtain an overview of the protein mutability landscape”. This paragraph seems to suggest that you can explore the data with HmV. I think that HmV is a good tool to report your data, but not to explore it.
Conclusions: The authors say: “... provides a new, powerful way to generate and display matrix data in web presentations and in publications.” To use heat maps in web presentations and publications is nothing new. I think that HmV makes it easier and user-friendly, but it’s not new.

View Article & Reports

Show less

A helpful review with advice for improvement

John Banks says:

Read full report

This article addresses the links between habitat condition and an endangered bird species in an important forest reserve (ASF) in eastern Kenya. It addresses an important topic, especially given ongoing anthropogenic pressures on this and similar types of forest reserves in eastern Kenya and throughout the tropics. Despite the rather small temporal and spatial extent of the study, it should make an important contribution to bird and forest conservation. There are a number of issues with the methods and analysis that need to be clarified/addressed however; furthermore, some of the conclusions overreach the data collected, while other important results are given less emphasis that they warrant. Below are more specific comments by section:

Abstract:

The conclusion that human-driven tree removal is an important contributor to the degradation of ASF is reasonable given the data reported in the article. Elephant damage, while clearly likely a very big contributor to habitat modification in ASF, was not the focus of the study (the authors state clearly in the Discussion that elephant damage was not systematically quantified, and thus no data were analyzed) ‐ and thus should only be mentioned in passing here ‐ if at all.

Introduction:

More information about the life history ecology of A. Sokokensis would provide welcome context here. A bit more detail about breeding sites as well as dispersal behavior etc. would be helpful – and especially why these and other aspects render the Pipit a good indicator species/proxy for habitat condition. This could be revisited in the Discussion as links are made between habitat conditions and occurrence of the bird (where you discuss the underlying mechanisms for why it thrives in some parts of ASF and not others, and why it’s abundance correlate strongly with some types of disturbance and not others). Again, you reference other studies that have explored other species in ASF and forest disturbance, but do not really explicitly state why the Pipit is a particularly important indicator of forest condition.

Methods:

Bird Survey: As described, all sightings and calls were recorded and incorporated into distance analysis – but it is not clear here whether or not distances to both auditory and visual encounters were measured the same way (i.e., with the rangefinder). Please clarify.
Floor litter sampling: Not clear here whether or not litter cover was recorded as a continuous or categorical variable (percentage). If not, please describe percentage “categories” used.

Results:

Mean litter depth graph (Figure 2) and accompanying text reports the means and sd but no post-hoc comparison test (e.g. Tukey HSD) – need to report the stats on which differences were/were not significant.
Figure 3 – you indicate litter depth was better predictor of bird abundance than litter cover, but r-squared is higher for litter cover. Need to clarify (and also indicate why you chose only to shown depth values in Figure 3.
The linear equation can be put in Figure 3 caption (not necessary to include in text).
Figure 4 – stats aren’t presented here; also, the caption states that tree loss and leaf litter are inversely correlated – this might be taken to mean, given discussion (below) about pruning, that there could be a poaching threshold below which poaching may pay dividends to Pipits (and above which Pipits are negatively affected). This warrants further exploration/elaboration.
The pruning result is arguably the most important one here – this suggests an intriguing trade-off between poaching and bird conservation (in particular, the suggestion that pruning by poachers may bolster Pipit populations – or at the very least mitigate against other aspects of habitat degradation). Worth highlighting this more in Discussion.

Discussion:

Last sentence on p. 7 suggests causality (“That is because…”) – but your data only support correlation (one can imagine that there may have been other extrinsic or intrinsic drivers of population decline).
P. 8: discussion of classification of habitat types in ASF is certainly interesting, but could be made much more succinct in keeping with focus of this paper.
P. 9, top: first paragraph could be expanded – as noted before, tradeoff between poaching/pruning and Pipit abundance is worth exploring in more depth. Could your results be taken as a prescription for understory pruning as a conservation tool for the Sokoke Pipit or other threatened species? More detail here would be welcome (and also in Conclusion); in subsequent paragraph about Pipit foraging behavior and specific relationship to understory vegetation at varying heights could be incorporated into this discussion. Is there any info about optimal perch height for foraging or for flying through the understory? Linking to results of other studies in ASF, is there potential for positive correlations with optimal habitat conditions for the other important bird species in ASF in order to make more general conclusions about management?

View Article & Reports

Show less

This well-organized review helped the authors improve their article

Lisa Locatello says:

Read full report

Bierbach and co-authors investigated the topic of the evolution of the audience effect in live bearing fishes, by applying a comparative method. They specifically focused on the hypothesis that sperm competition risk, arising from male mate choice copying, and avoidance of aggressive interactions play a key role in driving the evolution of audience-induced changes in male mate choice behavior. The authors found support to their hypothesis of an influence of SCR on the evolution of deceptive behavior as their findings at species level showed a positive correlation between mean sexual activity and the occurrence of deceptive behavior. Moreover, they found a positive correlation between mean aggressiveness and sexual activity but they did not detect a relationship between aggressiveness and audience effects.

The manuscript is certainly well written and attractive, but I have some major concerns on the data analyses that prevent me to endorse its acceptance at the present stage.

I see three main problems with the statistics that could have led to potentially wrong results and, thus, to completely misleading conclusions.

First of all the Authors cannot run an ANCOVA in which there is a significant interaction between factor and covariate Tab. 2 (a). Indeed, when the assumption of common slopes is violated (as in their case), all other significant terms are meaningless. They might want to consider alternative statistical procedures, e.g. Johnson—Neyman method.
Second, the Authors cannot retain into the model a non significant interaction term, as this may affect estimations for the factors Tab. 2 (d). They need to remove the species x treatment interaction (as they did for other non significant terms, see top left of the same page 7).
The third problem I see regards all the GLMs in which species are compared. Authors entered the 'species' level as fixed factor when species are clearly a random factor. Entering species as fixed factors has the effect of badly inflating the denominator degrees of freedom, making authors’ conclusions far too permissive. They should, instead, use mixed LMs, in which species are the random factor. They should also take care that the degrees of freedom are approximately equal to the number of species (not the number of trials). To do so, they can enter as random factor the interaction between treatment and species.

Data need to be re-analyzed relying on the proper statistical procedures to confirm results and conclusions.

A more theoretical objection to the authors’ interpretation of results (supposing that results will be confirmed by the new analyses) could emerge from the idea that male success in mating with the preferred female may reduce the probability of immediate female’s re-mating, and thus reduce the risk of sperm competition on the short term. As a consequence, it may be not beneficial to significantly increase the risk of losing a high quality and inseminated female for a cost that will not be paid with certainty. The authors might want to consider also this for discussion.

Lastly, I think that the scenario generated from comparative studies at species level may be explained by phylogenetic factors other than sexual selection. Only the inclusion of phylogeny, that allow to account for the shared history among species, into data analyses can lead to unequivocal adaptive explanations for the observed patterns. I see the difficulty in doing this with few species, as it is the case of the present study, but I would suggest the Authors to consider also this future perspective. Moreover, a phylogenetic comparative study would be aided by the recent development of a well-resolved phylogenetic tree for the genus Poecilia (Meredith 2011).

Minor comments:

Page 3: the authors should specify that also part of data on male aggressiveness (3 species from Table 1) come from previous studies, as they do for data on deceptive male mating behavior.

Page 5: since data on mate choice come from other studies is it so necessary to report a detailed description of methods for this section? Maybe the authors could refer to the already published methods and only give a brief additional description.

Page 6: how do the authors explain the complete absence of aggressive displays between the focal male and the audience male during the mate choice experiments? This sounds curious if considering that in all the examined species aggressive behaviors and dominance establishment are always observed during dyadic encounters.

View Article & Reports

Show less

An example of addressing statistical analysis

Chris Baker says:

In their response to my previous comments, the authors have clarified that only the data from the “Experimental phase” were used to calculate prediction accuracy. However, if I now understand the analysis procedure correctly, there are serious concerns with the approach adopted.

Read full report

First, let me state what I now understand the analysis procedure to be:

For each subject the PD values across the 20 trials were converted to z-scores.
For each stimulus, the mean z-score was calculated.
The sign of the mean z-score for each stimulus was used to make predictions.
For each of the 20 trials, if the sign of the z-score on that trial was the same as for the mean z-score for that stimulus, a hit (correct prediction) was assigned. In contrast, if the sign of the z-score on that trial was the opposite as for the mean z-score for that stimulus, a miss (incorrect prediction) was assigned.
For each stimulus the total hits and misses were calculated.
Average hits (correct prediction) for each stimulus was calculated across subjects.

If this is a correct description of the procedure, the problem is that the same data were used to determine the sign of the z-score that would be associated with a correct prediction and to determine the actual correct predictions. This will effectively guarantee a correct prediction rate above chance.

To check if this is true, I quickly generated random data and used the analysis procedure as laid out above (see MATLAB code below). Across 10,000 iterations of 100 random subjects, the average “prediction” accuracy was ~57% for each stimulus (standard deviation, 1.1%), remarkably similar to the values reported by the authors in their two studies. In this simulation, I assumed that all subjects contributed 20 trials, but in the actual data analyzed in the study, some subjects contributed fewer than 20 trials due to artifacts in the pupil measurements.

If the above description of the analysis procedure is correct, then I think the authors have provided no evidence to support pupil dilation prediction of random events, with the results reflecting circularity in the analysis procedure.

However, if the above description of the procedure is incorrect, the authors need to clarify exactly what the analysis procedure was, perhaps by providing their analysis scripts.

MATLAB code:

nTrials = 10; % 10 trials for each stimulus/condition
 
for boot = 1:10000 % 10,000 iterations for bootstrapping
   
  for i = 1:100 % 100 subjects
 
    data = randn(nTrials,2); % generate random values for each trial

    meandata = squeeze(mean(data(:))); % calculate mean
    stddata = std(data(:)); % calculate standard deviation

    zdata = (data - meandata)/(stddata); % convert to z-scores

    meancond1 = mean(zdata(:,1)); % calculate mean for each stimulus/condition
    meancond2 = mean(zdata(:,2));

    if meancond1 > 0 % evaluate sign of the mean values
      conscore = 1; % conscore indicates for which condition, positive z-values will indicate correctness
      conscoreB = 2; % conscoreB indicates for which condition, negative z-values will indicate correctness
    elseif meancond2 > 0
      conscore = 2;
      conscoreB = 1;
    else
      error = 'They are equal' % if mean z-values are equal, arbitrarily assign correctness
      conscore = 1;
      conscoreB = 2;
    end

    accScores(i) = sum(squeeze(zdata(:,conscore))>0)./nTrials; %calculate average correct for each condition for each subject
    accScoresB(i) = sum(squeeze(zdata(:,conscoreB))<0)./nTrials;

  end

  mAcc(boot) = mean(accScores); % calculate average correct for each condition across subjects for each iteration
  mAccB(boot) = mean(accScoresB);
 
end
 
meanBoot = mean(mAcc)% calculate mean correct for each condition across iterations
meanBootB = mean(mAccB)
 
stdBoot = std(mAcc)% calculate standard deviation for each condition across iterations
stdBootB = std(mAccB)

View Article & Reports

Show less

Positive comments in a detailed review

Sue Griffin says:

I think this paper excellent and is an important addition to the literature. I really like the conceptualization of a self-replicating cycle as it illustrates the concept that the “problem” starts with the neuron, i.e., due to one or more of a variety of insults, the neuron is negatively impacted and releases H1, which in turn activates microglia with over expression of cytokines that may, when limited, foster repair but when activated becomes chronic (as is demonstrated here with the potential of cyclic H1 release) and thus facilitates neurotoxicity. I hope the authors intend to measure cytokine expression soon, especially IL-1 and TNF in both astrocytes and microglia, and S100B in astrocytes.

Read full report

02/08/2013 ‐ additional comments

In more detail, Gilthorpe and colleagues provide novel experimental data that demonstrate a new role for a specific histone protein—the linker histone, H1—in neurodegeneration. This study, which was originally designed to identify axonal chemorepellents, actually provided a previously unknown role for H1, as well as other novel and thought provoking results. Fortuitously, as sometimes happens, the authors had a pleasant surprise: their results set some old dogmas on their respective ears and opened up new avenues of approach for studying the role of histones in self-amplification of neurodegenerative cycles. In point, they show that H1 is not just a nice little partner of nuclear DNA as previously thought. H1 is released from ‘damaged’ (or leaky) neurons, kills adjacent healthy neurons, and promotes a proinflammatory profile in both microglia and astrocytes.

Interestingly, the authors’ conceptualization of a damaged neuron → H1 release → healthy neuron killing cycle does not take into account the H1-mediated proinflammatory glial response. This facet of the study opens for these investigators a new avenue they may wish to follow: the role of H1 in stimulation of neuroinflammation with overexpression of cytokines. This is interesting, as neuronal injury has been shown to set in motion an acute phase response that activates glia, increases their expression of cytokines (interleukin-1 and S100B), which, in turn, induce neurons to produce excess Alzheimer-related proteins such as βAPP and ApoE (favoring formation of mature Aβ/ApoE plaques), activated MAPK-p38 and hyperphosphorylated tau (favoring formation of neurofibrillary tangles), and α synuclein (favoring formation of Lewy bodies). To date, the neuronal response shown responsible for stimulating glia is neuronal stress related release of sAPP, but these H1 results from Gilthorpe and colleagues may contribute to or exacerbate the role of sAPP.

View Article & Reports

Show less

About F1000Research