Date: Fri, 14 Oct 2005 02:37:28 +0200 From: Darlene Goldstein To: Rudy Guerra Subject: clk rev Part(s): 2 kooperberg-rev1.tex text/x-tex 85.21 KB Download Hi Rudy, well at long last, I have managed to slog through Charles's ms......... The short version: It's basically a nicely done, workmanlike study, but could have been written with more care. There are very many things in it that they need to clarify, all mostly minor but still necessary; a few need careful attention. There is also a tendency to overstatement that I think should be toned down. I have fixed a number of spelling and small grammatical errors, attached in the .tex. You might also want to check that their plot lines are thick enough. On p. 2 (combining genes), it might make sense to refer to the Morris et al. idea (in another chapter of the book). Before resubmission, the authors should give the revision a very careful reading for coherence and also use a spell checker. The 2 most important things: 1) The ms needs a little more explicit 'meta-analysis motivation', as it focuses more on a broader sense of 'combining information'. 2) They need to stop referring to the tests with most rejected hypotheses as 'most powerful'. This is really a major issue. Once they resubmit I am happy to go through their revision to tighten it up where necessary. It shouldn't need to go back to them again after they revise, assuming they do what I asked. Longer version: see comments below my signature, all may be sent to the authors. Editorial comments (for fixing after revision acceptable): citation style (remove hard-coding and change \nocite{} to \citet{} or \citep{} terms you might want to standardize throughout the book: T-test, T-statistic, T-distribution (normally it's $t$-test, etc) P-value, Z-statistic R-package, R-software R package names: use tt font?? Also, use same capitalization as package name italicize e.g., i.e. and other Latin terms While the Cards are on a roll, so am I.........so send my next 'job' along! Best, D -- Darlene Goldstein École Polytechnique Fédérale de Lausanne (EPFL) Institut de mathématiques Bâtiment MA, Station 8 Tel: +41 21 693 2552 CH-1015 Lausanne Fax: +41 21 693 4303 SWITZERLAND ---------------------------------------------------------------------------------- Comments for Kooperberg et al. authors General comments This ms describes results of a power and validity comparison study of several test statistics for small sample sizes. It seems to be mainly a summary of a previously published paper on this topic for single channel ('one color') arrays, with some expansion for dual channel ('two color') arrays. Although the study does seem to make a useful, though somewhat duplicated, contribution, the ms does not seem to directly address combining data or results. It appears that only one of the methods ('pooled t') uses information across experiments. The other methods rely on combining information across genes, but apparently only within study. As 'meta-analysis' is the primary book topic, some more explicit attempt beyond the small bit in the method section should be made to address this issue. There seems to be the implication 'more genes called differentially expressed' means the method is 'more powerful'. This must be changed (more on this below, p. 13 comments). Before resubmission, the authors should give the revision a very careful reading for coherence and also use a spell checker. Specific comments p. 1 'summarize results ... couple of additional methods' - is it stated anywhere which are the 'original' and which are the 'additional' methods? Otherwise, it's not clear that a distinction should be made, in which case could just say that you 'expand on the results from the previous analysis' or something to that effect. 'red-green' - does not seem necessary to say this. might consider using the terms 'single channel' and 'two channel' (or 'dual channel') instead of 'one-color' and 'two-color' throughout 'The limited number of repeats ... make small sample comparisons unattractive': huh? It is the limited number of repeats that make small sample comparisons necessary! Maybe just mean to refer to large variability here? why include 'spotted' along with two-color p. 2 choices for combining genes: also include Morris et al. idea (in another chapter of the book)?? 'multiple comparisons procedure' - what is meant here?? A FWER procedure? Isn't FDR a type of 'multiple comparison' procedure?? last para - 'one color' (oligonucleotide) arrays - there are other single channel array types (like nylon filters), maybe say 'e.g. oligonucleotide' p. 3 (log) = log 2 ratio? 2-color arrays - what is the superscript m for in x^m_{ijl}? Is it to indicate that you are looking at (log2) fold change, ie M values? Why not simplify notation by cutting out the x here? Or is 'm' here a placeholder for permutation? This is quite unclear. It appears (especially below where the perm procedure is described) that the 2-color array expts are meant to be direct designs. If this is right, it should be explicitly stated. If not, then some comments or explanation regarding design would seem to be required here. \mu_{ij} = 'true' (log) expression: true (log2) 'mean' expression?? sigma tilde ests the var or SD (it says var but looks like it should be SD)?? p. 4 x_{ijkl} has not been (explicitly) defined. Maybe it makes sense to define it before defining x^m_{ijl}. t-test has almost no power for small sample sizes shown in Kooperberg 2005: Do the authors really intend to state that they have discovered the low power of the t-test for small samples?? Surely this has been shown much, much earlier in the general context; more specifically in the microarray context by Lonnstedt and Speed at least as early as 2002. Even the next paragraph indicates that it has long been known that the t-test performs poorly -- otherwise why the need for 'better estimates'? no cite for lpe pkg url given is for the R-project, not for cran (http://cran.r-project.org/) p. 5 'reimplemented their approach' - something wrong with it? not available? 'first three approaches' - do not match the order of the methods in preceding paragraph 'combine......with another estimate \sigma_{0ij}' (if variance est, should be sigma^2) Is there any reason the df in the denominator have been rearranged? It would seem more natural to write d_{ij} + L_j - 1. (ditto with \nu_0 below) p. 6 the package is limma, not Limma why include description of RVM model if not including in the comparison? in 'Shrinking', is the denominator 'I' supposed to be the total number of genes? If so, this deviates from the previous notation for gene total given on p. 3 (=n). Methods combining experiments - it should perhaps be explicitly stated that at least the platforms (if not the labs) should be the same for these approaches. '... and it is thus reasonable to assume (variance same)' - It is not clear why it follows from there being several small expts that the within gene variance across expts should be (even roughly) the same. This assumption does seem necessary, but it really is an assumption and not a logical deduction. looks like a summation under the sqrt in the denominator has been left out p. 7 l. 2 - variance should be \sigma ^2 _i 'pooling typically has no effect ... as was confirmed ...' - you saw it twice and that 'confirms' a general trend?? This seems an overstatement that should be rephrased. sign flipping for perm test - this appears to assume that the design is direct, is that right? Is the 'm' indexing permutation here the same 'm' in x^m above, or a different one? The notation could use some clarification. In the formula, I() represents the indicator fn, this should be stated. (Previously, 'I' has also referred to number of genes, I'm assuming this is an error that will be fixed.) p. 8 'In addition we included four unrelated Drosophila cell line arrays.' - what does this mean? Which arrays are these? p. 9 '... are not repeat arrays using the same samples (sometimes referred to as 'technical repeats') ... ' - it is not the same samples that are hybed in a 'technical replicate', but rather samples extracted from the (exact) same RNA source. Multiple samples from a common pool, for example, as in the 2-color expts, are technical reps, but are not the 'same' sample. In addition, it makes sense to move the distinction between biological and technical replication to before the description of the 2-color data, as you will then have the opportunity to point out which experiments involve which type of replication. It's not clear that there is a need to state that some data are not being used (striatum), it only brings up the question of why. '... some expts are intended to establish ... size... power' - why not explicitly say S expts are for size and D for power? Assuming that (significance level or test size) \alpha is a number between 0 and 1, '\alpha \%' should be '100 \alpha \%' p. 10 '... removal of spots that were too close to background' - it is the fluorescence intensity (or expression) that is too close to background, not the spot/gene p. 11 The paragraph should be modified to remove as many 'we' and 'us' as possible. '... plot sorted p-values (horizontal) against ...' - you plot y against x, that is vertical against horizontal, so switch the order of the axis description. p. 13 paragraph describing table 1.4: should also comment on the conservativeness of several of the approaches next paragraph, 'they are likely fairly robust' - Again, you see it twice so your conclusion is 'likely'? How about instead something like 'they appear to be fairly robust'. next paragraph '... the loess approach is the most powerful.' - I very strongly object to the equation of 'more genes called differentially expressed' with 'more powerful' in the absence of Type I error rate control. No distinction is made between true and false positives here. In fact, this is quite a dangerous statement to make, as a reader not giving very careful examination of the ms may pick up the 'take home' message that 'loess it best', as you have said it is most powerful. This interpretation must be rewritten in a much more careful and correct manner. p. 15 table 1.5: reword or remove the statement 'The larger the percentage of differentially expressed genes, the more powerful a method is.' This makes no distinction between true and false positives, and also does not account for the different apparent test sizes. p. 16 '... permutation approach ... yields approximately unbiased p-values...' - it appears that all methods produce rather conservative p-values here. 'There are about 10,000 genes on these arrays ...' - earlier it says there are about 11,000; presumably you are rounding to 10k here to make the argument more transparent, is that right? In your illustration, you choose 40% differentially expressed genes, a figure that seems pretty unreasonable in most contexts. Wouldn't a smaller figure more in line with reality would do just as well? p. 20 At the end of the argument, you might remind the reader that the reason the problem goes away with more arrays is that there are more permutations available. '... fewer than, say, six repeats ...' - maybe change to 'fewer than six to eight'. Presumably the Welch stat suffers because the variance estimate is not pooled; you might point that out here as well. 'when the sample size is really small (n=2) it (pooled t) performs even better (than limma/Cyber-T)' - It should be clarified whether the comparison here is between pooled t and the pooled versions of limma/Cyber-T, or just the single experiment versions (in which case the comparison seems a bit unfair). borrowing df from other experiments: sim study suggests there can be a 'reasonable amount of expt to expt variation ...' - what does 'reasonable' mean here? What would be an 'unreasonable' amount? p. 21 '... can without much problem combine cell line RNA with fly RNA ... confirms ...' - what does this mean? That there's not a 'problem' to do it? What kind of 'problem'? That the results are meaningful (or, even better, correct)? It does not seem that this has really been shown. Using the term 'confirms' here again seems an overstatement. 'Methods which solely use a smoothed est of var ...' - this result seems very important, especially as the 'smoothed var' approaches are quite similar, at least superficially, to the 'borrowing strength' approaches; that is, they all 'do something' to the variance. This might be spelled out more clearly for the reader and perhaps given a bit more emphasis, particularly as this conclusion relates directly to the theme of the book as a whole. R and BioConductor packages: each individual package should be cited (use citation() in R for suggested reference) p. 24 'Normalization of arrays' - in fact, this appendix discusses all preprocessing steps to arrive at the 'final' signal that will be used, and not just normalization. The name should be changed to 'preprocessing of arrays' or 'signal (or expression) quantification' or something similarly broad. two-color arrays - you might mention the scanning software (or methods) and hardware used for the arrays. both array types: 'We employed various graphical QC tools, and felt that all arrays were of good quality.' This is not very specific, please refer to the methods and/or plot types that you used. And how the authors 'feel' about the quality of the arrays is not really relevant, it is preferable to indicate that the arrays were all of acceptable quality based on whatever the tests/methods were. one-color arrays: 'normalized by the RMA algorithm ...' - RMA is an algorithm for quantifying expression. It consists of 3 steps (background correction, normalization, summarization) and is not just normalization. So rather than say 'normalized' it should be 'quantified' or some similar term. 'log of the MAS5 Average Difference summary' - the term 'Average Difference' may be confusing here, this was the (very poor) measure of gene expression that MAS4 (not MAS5) used. The expression measure computed by MAS5 is commonly referred to simply as 'MAS5', although maybe Affymetrix has some other 'official' name for it. In any case, leave out 'Average Difference'. There is not a citation for MAS5. It is surprising that you got 'essentially the same results' using MAS5 and RMA, as MAS5 has been shown to have different performance properties than RMA. The results were qualitatively similar, or you really got comparable numerical results? If this is so, that might need some explanation. 'For RMA we normalized all arrays simultaneously' - again, 'quantified' (not 'normalized'). Also, where you say you 'analyzed' each of the experiments separately, do you mean 'quantified'? Results were 'essentially the same' means what? Broadly similar?