Spotted cDNA microarrays are a tool for high-throughput analysis of gene expression (Brown and Botstein (1999)). In the first step of the technique, DNA is “spotted” and immobilized on glass slides or other substrate, the microarrays. Each spot on an array contains a particular sequence, although a sequence may be spotted multiple times per array. Next, mRNA from cell populations under study is reverse-transcribed into cDNA and one of two fluorescent dye labels, Cy3 and Cy5, is incorporated. Two pools of differently-labeled cDNA are mixed and washed over an array. Dye-labeled cDNA can hybridize with complementary sequences on the array, and unhybridized cDNA is washed off. The array is then scanned for Cy3 and Cy5 fluorescent intensities. The idea is that the mRNA sample that contained more transcript for a given gene should produce higher fluorescence in the corresponding label in the spot containing that gene. The experimental data consist of Cy3 and Cy5 measurements for every spot on every array.
The data studied here are from an experiment to study 2,3,7,8-tetrachlordibenzo- p-dioxin (TCDD) (Martinez and Walker, unpublished data). This compound is known to induce a wide range of biological and biochemical responses, including gene induction. The experiment used the human hepatoma cell line HepG2 as an in vitro model to study TCDD. HepG2 is an established cell line for which metabolic enzymes are known to be inducible (Kikuchi, Hossain, Yoshida and Kobayashi (1998); Li, Harper, Tang and Okey (1998)). Thus it can be considered a prototype of the TCDD response.
The experimental design included replication to control the noise that is associated with microarray data. Although each gene was spotted only once per array, replication was achieved by using six arrays to study the two samples instead of just one or two. A separate labeling reaction was performed for each hybridization. Each array was spotted with the same set of 1920 genes. The README summarizes the “triple dye-swap” experimental design. As in Kerr and Churchill (2000), we refer to the TCDD-treated and control cell lines as “varieties.” Control cells are variety 1 and treated cells are variety 2. We refer to the fluor Cy3 as dye 1 and Cy5 as dye 2. Ninety-eight entries in the datafile, corresponding to 13 genes, were exactly 1 and appeared to be artificial “floor” values. These genes were removed from the dataset for analysis. There was no other data pre-processing. Thus the cleaned dataset has complete data for 1907 genes.