The usefulness of DNA microarray technology in the exploration of gene expression profiles can hardly be overstated. Along with the dramatic increase in microarray publications (a 2.5-fold increase per year since 1997, to >3,000 in 2004) and a broadening in the scope of applications, the methods of analysis of microarray data have grown in variety and sophistication, from simple fold-difference criteria to complex Bayesian procedures and clustering techniques [1–9]. In spite of these advances, evaluating variation and estimating the significance of the observed differences in recorded signals remain a difficult challenge. Existing methods provide various approximations of reality, balancing Type I against Type II error, but none can be considered ideal under all conditions. This is mainly due to the inherent complexity of the problem, but is sometimes due to the use of oversimplified conditions. Mehta et al.  offered an interesting overview of the subject, including the discussion of misconceptions about generality and applicability of some approaches.
Quantities that are taken as a measure of gene expression are affected by number of processes that contribute to variation, resulting in the random and/or pseudorandom component of the signal. Such variation may be separable into the "technical component," caused by the technical factors, such as variability of experimental protocols, autofluorescence and backscatter, laser-molecule interactions, photomultiplier noise etc., and the "sampling" or "physiological component," which depends mainly on the variability caused by the differences between the samples, e.g. differences in the biological state or purity of the sample composition. Distinguishing if a given gene expression intensity value is greater than the background noise or is different between two samples are fundamental issues in microarray analysis.
For the single-color Affymetrix arrays we have two groups of methods aiming at separation of the true signal from the random components: "low level" and "high level." The former approach deals with the fluorescence signals of each individual probe and includes background correction, adjustment for the nonspecific signal and expression summary that yields an approximation of RNA abundance or "gene expression," the latter takes the gene expression as an elementary variable . Low-level analysis can be used only when a relatively large number (say 8 or more) of probes or probe pairs per probe set is available. Moreover, the standard methods, such as dChip [12, 13] or RMA [14–16], are not applicable if only duplicates are available and not quite reliable for triplicates (URL address for the RMAExpress is http://stat.berkeley.edu/~bolstad/RMAExpress/RMAExpress.html).
The high level analysis consists of two basic steps: normalization and statistical evaluation of the observed differences. One approach to normalization relies on the "reference genes" (e.g. [17–20]), but genes providing "ubiquitous reference" are hard to find  and they require an additional experimental effort. The other calculates normalization coefficients from the expression values. In case of linear dependence between the measured signal and RNA abundance and balanced over- and under-expressed values, the global normalization is suitable. In case of nonlinearity, LOWESS  or other appropriate correction has to be employed . Statistical significance of the observations is often estimated using standard parametric tests, such as the t-test or ANOVA. However, a certain percentage of the frequency distributions always deviates from the normal distribution and in multiple comparisons of thousands of gene expressions this can lead to a substantial error. Furthermore, number of replicates is usually small and estimated variances often differ largely from the true value. Novak et al. suggested characterization of dispersion patterns of Affymetrix arrays with the method of consecutive sampling , which uses groups of genes with close mean expressions to estimate the standard deviations; similar approach was independently proposed by Baldi and Long  and Kamb and Ramaswami . Two component model including the constant and proportional terms of the standard deviation was introduced by Rocke and Lorenzato  in the context of analytical chemistry and later applied to cDNA and oligonucleotide microarrays ; see also [28, 29]. Choe et al.  compared performance of the t-test, modified t-test developed by Tusher et al.  and method of Baldi and Long  and concluded that the last method showed, under given conditions, superior performance. Some other approaches were also suggested and tested. For example, Troyanskaya et al. examined three nonparametric methods, Durbin et al.  proposed a variance-stabilizing transformation and Bilke et al.  used Bayesian approach. Among other publications, the paper by McClinick et al. , e.g., deals with reproducibility of microarray data, Kooperberg et al.  compared several statistical methods and Jarvinen et al.  different microarray platforms.
Many new microarray-based platforms are available and some, which allow parallel analysis of many samples, may be suitable for high throughput analysis. Here we utilized the Illumina GEX Sentrix™ Array Matrix (SAM) system to evaluate gene expression for 632 genes in 96-well format. Our first aim is to characterize expression data and assess various sources of dispersion. We describe the data obtained from replicate hybridizations, reverse transcription reactions, and biological cultures and evaluate the frequency distributions. Subsequently, we compare the dispersion patterns, and assess the contribution of each additional process to variability of data. The second aim is to study systematic differences in gene expression values in the control cell cultures and cell cultures subjected to a particular treatment. We analyze the data from a cell line subjected to a continuous low-dose oxidative stress exposure (~10 μm H2O2) for 24 hrs. In our analysis we use the consecutive sampling method , which quantifies dispersion between two samples by ranking the probe sets according to the mean signal intensity, grouping them in bins containing k consecutive gene pairs, and calculating standard deviations from the difference of expressions (in this study k = 12). We search for the best candidate genes affected by the treatment among the differentially expressed genes, using the consecutive sampling and coincidence test. The results are compared to the t-test and Wilcoxon (Mann and Whitney) nonparametric test. In addition, we examine consistency of the results obtained by the coincidence test and compare to the t-test on normalized data, log-transformed data and data subjected to the variance stabilization transformation, to the method of analysis by Tusher et al.  and to Baldi and Long  CyberT method.