Data Analysis/Informatics

General Considerations

The methods used for analysis of high-throughput screening data are as important as the screening protocols. There is no one correct method, and different possibilities should be evaluated for individual screens during the screen development process. General considerations are discussed during the initial data meeting and we are happy to schedule additional consultations as needed. In addition, analysis methods frequently utilized are discussed in the Small Molecule and RNAi general workflows.

Assay Evaluation and Optimization - Z' Factor

 

The Z´ factor calculation is useful during optimization and piloting for quality assessment of assay conditions (Zhang et. al. 1999). An assay can be considered validated for high-throughput screening after multiple independent experiments have been shown to result in reproducible and suitable Z´ factor values. Each experiment should be performed on at least one full 384-well plate on which ½ of the wells contain positive controls and ½ of the wells contain negative controls. This will produce a statistically significant data set for evaluation. The experimental conditions to be used during the screen, including use of laboratory automation, should be mimicked to the greatest extent possible.

To quantitatively rank assay conditions, calculate Z´ factors from the data collected:

SD + = positive control standard deviation

SD – = negative control standard deviation

Ave + = positive control average

Ave – = negative control average

For small molecule screening assays:

Z' ValueAssay Fitness Indicator
1 > Z´ > 0.8An extremely robust assay
0.8 > Z´ > 0.6A robust assay
0.6 > Z´ > 0.4Acceptable, but identification of positives will benefit significantly from any improvement
0.4 = Z´The minimum recommend for high throughput small molecule screening

This table may differ slightly from published recommendations. However, it is based on the general experience of small molecule screens at ICCB-Longwood. We commonly observe that screening results rarely achieve the high quality levels seen during piloting using controls. Also, cell-based assays frequently cannot achieve as high a Z’ factor as biochemical assays.

For RNAi screening assays, the experience of ICCB-L screeners has been that Z’ factor values under the most optimized conditions are often < 0.5. These RNAi screening assays have nonetheless been productive.

Zhang J, Chung T, Oldenburg KR. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. 1999. J Biomol Screen. 1999;4(2):67-73.

Assay Variability

Signal/background ratio (S/B) and well-to-well variability (CV) are important issues to consider. As assay variability increases, the S/B ratio must increase for the screen to be successful. We recommend using a positive control condition to determine the S/B ratio. To determine S/B, fill a plate with reagents using the same equipment to be used for the screen. Add several dilutions of the positive control to several wells, and determine whether the positive control can be reproducibly detected above the well-to-well variation. These data will provide an estimate of the potential false-positive and false-negative rates of the assay.

Data Handling

After assay optimization and performing an automated Z' factor experiment, please schedule a meeting with the ICCB-L Data Curator (Jen Splaine) and Data Analyst (David Wrobel). The goal of this meeting is to discuss data handling throughout the screen (raw screening data, potential analysis methods, hit criteria, annotation of results), visualizations that will be utilized to assess screen performance and Screensaver, ICCB-Longwood's laboratory information management system.

Regular data submission and continued assay robustness are essential for continued screening of ICCB-L libraries. 

Detailed instructions for data formatting and deposition into Screensaver will be provided after the initial data meeting and are available here. You will also be provided with a primary screen report template (Excel file), to which you will add detailed information about the assay protocol, controls, biosources, analysis methods and criteria for scoring positives. Following return of annotated screen results and the completed primary screen report, your data will be deposited into Screensaver by ICCB-L staff. At that point instructions will be provided as to how to view your data and compare your results to those of other screens (depending on your data sharing level). You may submit a cherry pick request for follow-up testing of positives in your screen only after submitting the annotated screen result data and a completed primary screen report. Guidelines for requesting small molecule cherry picks are available here, and Functional Genomics cherry pick guidelines are available here

Raw screening data should be saved directly to your permanent storage server (home folder, collaboration folder, hard drive). Please do not store your data on ICCB-L computers, as these hard drives are regularly cleared.

Data Collection and Analysis

The formatted raw data Microsoft Excel templates that are provided by ICCB-L staff are set up so that screeners can easily organize and analyze their screening data. Other programs that are utilized include Dotmatics Vortex and the R statistical package, please see available analysis software in left menu for more options. Some general considerations are highlighted below.

While it is the responsibility of individual researchers to analyze their screen results and establish criteria for scoring positives, ICCB-Longwood staff can provide advice during this process.

Perform Assay in Replicate: Most assays designed for high throughput screening have a high amount of inherent variability and error associated with them. For this reason, it is strongly recommended that all small molecule screens be performed in technical duplicate and RNAi screens be performed in technical triplicate. Replicate data points enable the researcher to focus on positive results detected in all replicates, thus reducing the false positive rate.

Control Wells in Assay Plates: Assay-specific positive and negative controls are an essential part of a well-designed assay. It is advisable to include positive and negative controls on every assay plate and utilize as many available wells as possible; these are referred to as plate-based controls. They are essential in identifying plate to plate variability, detecting assay background levels and are frequently utilized in data normalization and/or analysis.

Normalize Plate Readout: Many assays involve a readout that is time-dependent and therefore have background and intensity levels that will vary over time and by plate. A screen that has an appreciable change in signal intensity and background from plate to plate should first be scaled using fold induction by dividing the observed value in each well by the plate experimental well median or the plate control well median, depending upon experimental design. In general, plate median is more reliable to use for re-scaling or normalization than plate mean, as it is less affected by outlier values. Screens without appreciable time-based or plate-based signal intensity variance should forego the fold-induction calculation and simply be normalized on a plate by plate basis by calculating the z-score or robust z-score. These z-scores can then be used as an indication of the probability that a screening positive is not due to background noise.