One of my favorite pastimes lately is collecting examples from the geological literature in which the statistical analysis went incredibly wrong. Take for example the papers dealing with grain-size distributions that advertise cumulative probability plots as the best technique to identify subpopulations in a mixed distribution. Here is what G.S. Visher says in his 1969 paper on "Grain size distributions and depositional processes" (Journal of Sedimentary Petrology, v. 39, p. 1074-1106):
"The most important aspect in analysis of textural patterns is the recognition of straight line curve segments. In figure 3 four such segments occur on the log-probability curve, each defined by at least four control points. The interpretation of this distribution is that it represents four separate log-normal populations. Each population is truncated and joined with the next population to form a single distribution. This means that grain size distributions do not follow a single log-normal law, but are composed of several log-normal populations each with a different mean and a standard deviation. These separate populations are readily identifiable on the log-probability plot, but they are difficult to precisely define on the other two curves." (p. 1079)
I am wondering if this tendency to see straight line segments in cumulative probability plots and to give them some special significance is a syndrome restricted only to geologists - whose abilities for pattern recognition are excellent in general - or one could find such examples from other fields as well. The fact that a certain distribution looks like a straight line on a cumulative plot does not mean that mixtures of the same type of distribution will plot as straight line segments. The excellent sedimentologist Robert Folk has pointed this out in a 1977 discussion of a paper coauthored by Visher (in which they try to prove that the Navajo Sandstone is not an eolian deposit - yeah, right):
"A general defect of the Visher method is exemplified by Kane Creek #2, which is shown as consisting of four straight line segments, implying that it is a mixture of four populations. It can be proved by anyone using probability paper and ordinary arithmetic that such kinky curves can be made by a simple mixing of two (not four) populations that are widely separated; the 'flat' portions represent the gaps in the distribution. Furthermore, mixing of populations on probability paper results in smoothly curving inflexions, not angularly joined straight-line segments."
Despite this, multiple straight-line-fitting to cumulative probability plots is fashionable again, although this time it is done on log-log plots of exceedence probability of either bed thickness or fault size data. But this is going to be part of a paper that I am working on right now (in the evenings and weekends...) -- so more about this later.