GE Healthcare
 
GE Healthcare Life Sciences Part of GE Healthcare
Location: Home > Discovery Matters > > Lucidea Microarray ScoreCard: An integrated analysis tool for microarray experiments
Discovery Matters
Discovery Matters Magazine
Guidelines for authors

MICROARRAY

Lucidea Microarray ScoreCard: An integrated analysis tool for microarray experiments

Download PDF
..
March 2001

H. Samartzidou, L. Turner, T. Houts, M. Frome, J. Worley, and H. Albertsen Molecular Dynamics, 928 E. Arques Ave., Sunnyvale, California 94085

Gene expression profiling is used to better understand the molecular mechanisms that govern cellular function and growth. In this area, microarray analysis has emerged as a powerful, high-throughput tool to measure the relative expression levels of thousands of individual genes in parallel, under different experimental conditions (1, 2). Microarray measurements often appear to be systematically biased, however, and the numerous contributing factors are poorly understood. Ideally, gene expression data should be evaluated by relating experimental data to that obtained from control genes of known concentration and ratio. We have developed an integrated tool of controls and analysis software that will allow the user to evaluate relative data quality using predefined genetic targets and spikes included on each slide.

Introduction

The Lucidea™ Microarray ScoreCard™ 1.0 system consists of three interdependent components: control DNA targets, control mRNA spike mixes, and the ScoreCard analysis software.

The control targets are arranged in the control plate to make optimal use of the design of Generation III Array Spotter and the geometric patterns of target spots on the microarray (3). Twelve replicas of the controls are spotted on each slide, so that each pen spots one replica. This layout of the control targets on each array allows comparison between pens within individual experiments, and it provides a basis for comparing measurements across multiple slides.

The spike mix consists of in vitro transcribed yeast intergenic region (YIR) mRNAs corresponding to dynamic range and ratio controls, each represented at a specific concentration and ratio. These spike mixes are added to the mRNA sample before labelling, allowing confirmation of effective labelling and hybridization. The concentrations are designed to allow the user to determine the sensitivity of the system for each experiment.

Lucidea Microarray ScoreCard software processes microarray experimental data from a slide and calculates a variety of quality measures utilizing the control elements described above. In addition, the analysis software normalizes the data, using a proprietary algorithm based on all elements present in the array. Normalized data can then be exported for data visualization and mining.

In this report, we show the use of the control plate and spikes in microarray experiments. The data generated in these experiments demonstrate the quality measurement features of the ScoreCard controls and software, including detection limit evaluation, dynamic range, spot-to-spot variation, and ratios. Additionally, we present data assessing the normalization method used by ScoreCard and demonstrating the capability of Lucidea Microarray ScoreCard to identify various experimental problems.

..
Control plate and control mRNA spike mixes
The ScoreCard control plate, a 384-well plate filled with control genetic content for deposition, is configured so that each of the 12 spotting pens deposits one replica of the controls onto each slide, one replica per spotting pen. The plate includes controls for dynamic range, ratio, and positive and negative hybridization controls (Table 1).

Table 1. Control samples included in the Lucidea Microarray ScoreCard control plate
Spot Position
Control SampleID in Scorecard
1
Positive control1PC
2
Negative control1NC
3
Dynamic range control 11DR
4
Dynamic range control 22DR
5
Dynamic range control 33DR
6
Dynamic range control 44DR
7
Dynamic range control 55DR
8
Dynamic range control 66DR
9
Ratio control 11RC
10
Ratio control 22RC
11
Ratio control 33RC
12
Ratio control 44RC
13
Negative control 22NC
14
House keeping gene 11HG
15
House keeping gene 22HG
16
House keeping gene 33HG
17
House keeping gene 44HG
18
Reserved for future use1 Reserved
19
Reserved for future use2 Reserved
20
Reserved for future use3 Reserved
21
Reserved for future use4 Reserved
22
House keeping gene 55HG
23
Negative control gene 33 NG
24
House keeping gene 66HG
25
House keeping gene 77HG
26
House keeping gene 88HG
27
House keeping gene 99HG
28
House keeping gene 1010HG
29
House keeping gene 1111HG
30
Negative control 44NC
31
Negative control 55NC
32
Positive control 22PC

YIRs were selected to ensure no cross-hybridization with each other or with human genes. These regions are used to prepare dynamic range and ratio controls. They are amplified by PCR for control targets and transcribed in vitro to produce mRNA for a spike mix to be included in the labelling reactions (Table 2). When combined, these comprise a powerful internal control, allowing assessment of target attachment, RNA labelling, hybridization uniformity, detection limits, dynamic range, and expression ratios from each hybridized slide. Additionally, the ratio and dynamic range controls (Table 2) allow for evaluation of the effectiveness of data normalization (see Lucidea Microarray ScoreCard normalization method).

Table 2. Control mRNA spike mix components
Sample
Cy3:Cy5 ratio
Coac in spike mix
(pg/5 ul mix)
Relative abundance
Cy3
Cy5
1DR
1:1
33 00033 0003.3%
2DR
1:1
10 00010 0001%
3DR
1:1
100010000.1%
4DR
1:1
3303300.033%
5DR
1:1
1001000.01%
6DR
1:1
33330.0033%
1RC
1:3
10001000NA
2RC
3:1
30003000NA
3RC
1:10
100010 000NA
4RC
10:1
10 0001000NA


An array with Lucidea Microarray ScoreCard control DNA targets and mRNA spikes is shown in Figure 1. For this array, the control plate was spotted in triplicate and hybridized with skeletal muscle cDNA probes, including the control spikes. The identities of the control elements are listed in Table 1. Additional information concerning the control mRNA spike mix components is provided in Table 2. Further details concerning the array can be found in the legend for Figure 1. In actual microarray experiments, the controls are spotted along with the experimental targets.



Fig 1.
Arrays with Lucidea Microarray ScoreCard control DNA targets and mRNA spikes. For this slide, the control plate, including the control spikes, was spotted in triplicate and hybridized with skeletal muscle cDNA probes. The identities of the control elements are listed in Table 1. The target DNA in spots 3–8 corresponds to the mRNA control spikes included in the labelling reactions over a dynamic range of 33–33 000 pg/labelling reaction. The target DNA in spots 9–12 corresponds to the mRNA control spikes included in the two labelling reactions at different ratios. For example, the mRNA corresponding to spot 9 is present at a ratio of 1:3 Cy™3:Cy5 (see Table 2 for additional information).

Lucidea Microarray ScoreCard analysis software
The ScoreCard software has been designed to help the user evaluate data quality, validate system performance, and accurately normalize data within and across individual experiments. The ScoreCard specifications are defined exclusively for use with the control plate and data generated with the Molecular Dynamics microarray system. This software provides quality values to make relative measurements of data quality within and across slides, from one or multiple experiments.

The application displays results from data analysis in two window-views. The user switches between the two views by clicking on their corresponding radio buttons. The Graph window (Fig 2) displays a scatter plot of the control elements to allow visual evaluation of the data, along with a Ratio Analysis Table for evaluation of normalization accuracy. The Detection Limits Table allows the user to evaluate the sensitivity of the experiment, while the Housekeeping Gene Performance Table measures spot-to-spot signal variation across the slide, using a highly replicated housekeeping gene reference (a total of 48 replicates per slide). The Data View (not shown) allows the user to view the calculated quality metrics for all control targets. All quality metrics utilize user-definable thresholds to aid in data evaluation. Quality measurements outside of these thresholds are displayed in red.



Fig 2.
Lucidea Microarray ScoreCard Software Graphical User Interface (GUI) Graph Window.

In addition, the ScoreCard software calculates quality measurements that enable the user to validate the performance of the overall microarray system. Such measurements include variation of signal for replicate spots from pen to pen (% pen variation) and from spot set to spot set within a slide (% spot set variation). These measurements are displayed upon the user’s request at the System Validation window shown in Figure 3. Quality measurements with values outside thresholds are displayed in red.



Fig 3.
ScoreCard GUI: System Validation window.

Lucidea Microarray ScoreCard normalization method
The primary challenge in microarray data analysis is to determine which genes are differentially expressed and by how much. This is complicated by the fact that there are both systematic and random errors associated with microarray measurements that cause signal variation from slide to slide. Such variations make comparison of differential expression ratios within and across experiments almost impossible.

Systematic errors that can bias microarray measurements involve characteristics of the fluorescent nucleotides used in labelling. The reverse transcriptases used do not incorporate Cy3 and Cy5 fluorescent nucleotides with the same efficiency. Furthermore, Cy3 and Cy5 each have distinct fluorescence efficiencies and detection characteristics, so that observed signal intensities are not uniform between the same quantity of Cy3 labelled probes relative to Cy5 labelled probes. Due to these sorts of errors and variations, the raw ratios do not cluster around a value of one and are not distributed normally. In addition, ratios appear to be less accurate and less precise at signal values approaching the detection limits of the system. These observations suggest that a new normalization procedure is required to transform raw signals or ratios into normalized ratios, so that a ratio of one correctly represents non-differentially expressed genes.

Normalization is traditionally conducted within a channel, using such correction factors as the average or median signal for all spots on an array, or the mean signal for housekeeping genes or positive controls. Such methods assume that normalization is constant. However, experiments in which a single mRNA sample is split and labelled with the two different dyes (so all genes should show no differential expression) still show distorted ratios. Furthermore, such experiments suggest that the differential expression ratios vary with Cy5 signal, and the amount of distortion in ratio varies from slide to slide. An exponential curve adequately describes the relationship between the log ratios and the Cy5 signal. As a result, a non-constant normalization method that takes this into account is the most effective method for correcting the raw ratios.

In view of the above observations and extensive experimentation, Lucidea Microarray ScoreCard software utilizes a proprietary normalization method based on a regression analysis that includes data from all the spots on the slide. The normalization in Lucidea Microarray ScoreCard is a two-colour procedure that corrects for the following artefacts:

1. A difference in ratio caused by systematically reduced signal in one channel relative to the other.

2. A distortion in the average ratio at low signal intensities, an effect whose severity varies from experiment to experiment.

This normalization method has been tested and proven to work well with different slide types and spotting chemistries. Experimental data suggest that it improves both the accuracy and the precision of the microarray measurements (Fig 4 and Fig 5).



Fig 4.
Comparison of the dynamic range control log ratios before and after normalization. Note the arrows indicating the zero position; a log ratio of 0 corresponds to a linear ratio of 1. All the dynamic range ratios are ~ 0 after normalization, as expected, indicating that the normalization procedure improved the accuracy of the log ratios. ULR = uncorrected log ratios; NLR = normalized log ratios.

Fig 5. Precision of dynamic range control log ratios before and after normalization. The standard deviation for the uncorrected log ratios (plotted in red) of the dynamic range controls (pg mRNA in spike) is compared with the corresponding normalized values (plotted in blue). After normalization, both the absolute standard deviation values and their confidence intervals (error bars) are decreased, indicating that the normalization procedure improved the precision of the log ratios. SDULR = standard deviation for the uncorrected log ratios; SDNLR = standard deviation for the normalized log ratios.

The ScoreCard software reports both normalized and observed ratios for the dynamic range and ratio controls, allowing easy comparison of the normalized values with the target values. These comparisons enable the user to evaluate if the normalization procedure improves the accuracy of the data.

Interpreting data quality using Lucidea Microarray ScoreCard
We have used Lucidea Microarray ScoreCard to evaluate the quality of various microarray experiments. In some cases, we have deliberately manipulated the experimental design to determine if ScoreCard would indicate the problem, and if the output values could help the investigator in troubleshooting the problem. In Figure 6, we have compared the normalized ratios calculated by ScoreCard for a typical good-quality experiment with those from an experiment with unusually high Cy3 background. In the latter experiment, the ratio values for the dynamic and ratio controls are outside the threshold range (1.5 fold difference from the target values) even after normalization. As a result, they are highlighted in red, indicating that the experiment should be rejected as invalid.

Fig 6. Comparison of the normalized ratios calculated by ScoreCard for a typical good-quality hybridization (panel A) and from a hybridization with high Cy3 background (panel B). Note the values for the normalized log ratios in the tables below each image. For experiment A, all normalized ratios are very close to the expected value (within the thresholds); however, in experiment B, all ratios are outside the thresholds and therefore highlighted in red.

In Figure 7, we have compared various ScoreCard quality measures from a successful hybridization (panel A) with those from a hybridization with poor hybridization uniformity (panel B). The ScoreCard measurements indicating spot-to-spot reproducibility and pen-to-pen variation are clearly outside the acceptable range and consequently highlighted in red.




Fig 7.
Comparison of ScoreCard quality measures from a successful hybridization (panel A) and from a hybridization with poor hybridization uniformity (panel B). The housekeeping gene performance and system validation quality measurements are shown. All measurements with values outside the appropriate thresholds are highlighted in red.

In Figure 8, we have presented ScoreCard data from an experiment in which we evaluated the effects of probe concentration on hybridization. In this experiment, we compared hybridizations using a typical probe consisting of 25 pmol of dye equivalents with hybridizations using probes of only 3 pmol of dye equivalents. For the latter hybridizations, the software detection limits measurements are outside the acceptable values and highlighted in red. This indicates that with probe dilution, the sensitivity of detection and the dynamic range of the system decrease.



Fig 8.
Effect of probe concentration on detection limit measurements. Both the housekeeping gene ratio and the dynamic range measurements have values outside the acceptable range (highlighted in red) when the hybridizing probe includes only 3 pmol of dye equivalents.

The above experimental cases demonstrate that Lucidea Microarray ScoreCard can be successfully used to identify a variety of problems in experimental data.

Conclusion
Lucidea Microarray ScoreCard system is the first integrated analysis tool for evaluating data quality across microarray experiments. With Lucidea Microarray ScoreCard, the user can compare and troubleshoot microarray experiments and assess the performance of the microarray system by including predefined genetic targets and spikes on individual slides.

By combining a system of control elements with analysis software, Lucidea Microarray ScoreCard quickly and easily provides the user with:

• QC for all aspects of hybridization, including probe labelling, slide quality, and system performance

• a guide for ratio and detection limit analyses and dynamic range evaluation

• a standardized report for each experiment

• data normalization before summarization and mining

• data preparation for visualization and mining

ScoreCard can identify various problems with data quality, some of which are not easily detected by simple visual examination of microarray images. Consequently, it can improve and streamline the process of microarray data analysis by allowing only normalized data of acceptable quality to proceed from data extraction to data visualization and mining.

References
1. Bowtell, D. D. L.,
Nature Genetics 21, 25-32 (1999). [PubMed abstract]
2. Brown, P. O. and D. Botstein,
Nature Genetics 21, 33-37 (1999). [PubMed abstract]
3. Barker, D.
et al., Systems Approach to Fabricating and Analyzing DNA Microarrays. In Microarray Biochip Technology, published by BioTechniques (Schena, M., volume ed., ISBN 1-881299-37-6), Natick, MA, pp. 65-86 (2000).