data set or dataset cites hard-coded into text and bib made a few small grammatical corrections and corrected a number of spelling errors and name consistency changed " to `` '' for nice looking quotes p.7, last sentence: identified more genes, how do they know it's a benefit? They could presumably be false positives. p.8 - 'various quality control checks' - what are these? In the Michigan set 10/86 ~ 11% of arrays were discarded. This is a substantial proportion of the original set - to what extent could these `poor quality' arrays have been responsible for the original findings? That is, we would like to know that differences in results are due to methodological differences rather than differences in the actual data set analyzed. pdnn - hard to know what its properties are, it's not in affycomp fig 1.2 - This is a poor graphic. should make these plots squares rather than rectangles, and it would be even better to plot the difference vs the average for both quantities to more clearly highlight the differences in the 2 data sets fig 1.3 - again, it would be useful (though not completely necessary) to plot this in a difference vs average configuration as well; also, it would make more sense to me to see partial vs full (rather than the full vs partial which is shown), since in some sense we might think of partial as a fn of full will the book be having color?? Some of the figs use color and would need to be redone using different line types. In general, the line width of the plots and plot label sizes should be increased for better readability fig 1.6 - the caption does not explain what the 2 colors represent (which curve is red and which black?). The caption might also explain what test the p-value corresponds to. The comparison of correlations with different quantification methods (sec. 1.7.2) is not entirely clear to me - were the methods applied 'out of the box' on the reduced set of probesets? Does 'pdnn' refer to pdnn plus the other preprocessing? Any additional preprocessing done with the other methods? Which dchip method is used (I assume pm only, but I didn't notice it stated anywhere) The authors have done a nice job motivating, implementing and demonstrating their methods; this type of work is clearly necessary as a prelude to being able to combine raw data for analysis. Like them, I also believe that this will prove more sensitive (and therefore ultimately more fruitful) for finding relevant genes. However, the conclusion seems somewhat overstated here. They state that they have demonstrated the benefit of pooling data; in fact, it seems to me that they have stated that pooling data should be beneficial and have shown how it might be done (again, they have done this well). Sorry if I missed it, but it wasn't sufficiently clear to me that their findings represent true (rather than false) positives. Perhaps they could emphasize the literature findings a bit more? Or be more cautious in arriving at the conclusion that the found genes are good (true) ones? In any case, if they do have evidence that the found genes are true it would be beneficial to see that more prominently displayed. If not, they might just state that their method appears to have promise (or something along those lines). Unlike a number of other methods, it is based on fundamentally sound biological principles, obviously something that works in its favor.