Chapter four - Using DNA Microarrays to Assay Part Function

https://doi.org/10.1016/B978-0-12-385075-1.00004-4Get rights and content

Abstract

In recent years, the capability of synthetic biology to design large genetic circuits has dramatically increased due to rapid advances in DNA synthesis technology and development of tools for large-scale assembly of DNA fragments. Large genetic circuits require more components (parts), especially regulators such as transcription factors, sigma factors, and viral RNA polymerases to provide increased regulatory capability, and also devices such as sensors, receivers, and signaling molecules. All these parts may have a potential impact upon the host that needs to be considered when designing and fabricating circuits. DNA microarrays are a well-established technique for global monitoring of gene expression and therefore are an ideal tool for systematically assessing the impact of expressing parts of genetic circuits in host cells. Knowledge of part impact on the host enables the user to design circuits from libraries of parts taking into account their potential impact and also to possibly modify the host to better tolerate stresses induced by the engineered circuit. In this chapter, we present the complete methodology of performing microarrays from choice of array platform, experimental design, preparing samples for array hybridization, and associated data analysis including preprocessing, normalization, clustering, identifying significantly differentially expressed genes, and interpreting the data based on known biology. With these methodologies, we also include lists of bioinformatic resources and tools for performing data analysis. The aim of this chapter is to provide the reader with the information necessary to be able to systematically catalog the impact of genetic parts on the host and also to optimize the operation of fully engineered genetic circuits.

Introduction

The focus of synthetic biology has been the design and implementation of small-scale genetic circuits (Elowitz and Leibler, 2000, Ham et al., 2008, Tabor et al., 2009), including the transplantation and reconstruction of small metabolic pathways in suitable hosts (Lee et al., 2008, Steen et al., 2010). The focus on small systems reflected, in part, the laborious processes of DNA fragment construction and assembly required to optimize designed systems. The rapid expansion of DNA synthesis capacity (Czar et al., 2009, Tian et al., 2009) and the development of simple protocols for large-scale assembly of DNA fragments (Gibson et al., 2008, Gibson et al., 2009) have broadened the potential focus of synthetic biology. However, larger synthetic circuits require more components (Voigt, 2006), and their reliable operation requires accurate assessments of the impact of each of these components on the host cell processes. When such circuits overburden the host, mutations will rapidly accumulate to relieve the stresses that are introduced. An accurate assessment of the impact of synthetic circuits on host physiology will enable intelligent choice of the circuits chosen for implementation.

The components of synthetic circuits and their impact on the host can be broadly classified into two categories: (1) Regulatory components comprised transcription factors, sigma factors, and viral RNA polymerases, which enable controlled expression of individual circuit components. Importantly, the DNA sequence specificities of the regulators may result in aberrant and possibly deleterious gene expression within the host. (2) Circuit devices comprised sensors, receivers, signaling molecules, enzymes, etc. These components receive information and then command the cell to perform a task, such as producing chemicals and fuels, secreting proteins, and sending out communication signals. Individual circuit components may be deleterious to the host when overexpressed. Additionally, certain combinations of components may be deleterious even when the individual components have no deleterious effect. Consequently, it is importantly to monitor and catalog the impact of individual and combinations of circuit components on the host in order to facilitate the design process, the choice of components for a particular circuit, and troubleshooting of large synthetic circuits. In addition, a good understanding of the impact of components may facilitate modification of the host to better tolerate the circuit. For example, high level expression of some proteins can result in the accumulation of unfolded products within the cytoplasm, triggering the cytoplasmic heat shock response. This can be relieved by overexpression of cytosolic chaperones.

DNA microarrays provide an easy way to monitor changes in gene expression in the host (Rhodius et al., 2002). They can be used to pinpoint the effects of regulator parts of genetic circuits and provide a useful tool for identifying stress-response pathways that are upregulated in response to circuit devices. In this chapter, we will describe the process of performing microarray experiments and associated data analysis to monitor gene expression. The overall process is illustrated in Fig. 4.1 and involves the following steps: (1) Experimental setup: the biological question addressed, experimental design, and performing the microarray experiment(s); (2) Preprocessing: data quality control and normalization prior to analysis; and (3) Analysis: the statistical tools that identify significantly differentially expressed genes, clustering to identify coregulated genes or similar datasets, and functional annotation to identify common and/or enriched properties of the gene products in the final datasets. We discuss each step and indicate the utility of this technology for synthetic biology.

Section snippets

Different Microarray Platforms

The selection of microarray platforms is summarized in Table 4.1. Early microarrays used cDNA libraries, oligonucleotides, or PCR products fabricated by individual laboratories and printed onto polylysine- or epoxy-treated glass slides. Although inexpensive, the process is laborious, results can be inconsistent and usually limited to a single datapoint for each open reading frame (ORF). Commercially available platforms range from low density arrays with a single printed oligonucleotide probe

Experimental Design

Microarray experiments can easily generate large lists of differentially expressed genes making it difficult to unravel the underlying biology. Consequently, careful experimental design is critical for interpretable data. Optimal datasets are where only a few cellular systems are disrupted by a designated perturbation (e.g., induced overexpression of candidate gene). Here, the response usually occurs within a short time frame of the perturbation and therefore can be monitored by following gene

Experimental Variation

Experimental variation derives either from biological or technical issues. Variability in biology is more difficult to control and can come from issues with the biological sample, the growth conditions of the culture, and alteration in gene expression levels. We discuss each of these in turn and then conclude with the technical issues contributing to variability. Issues related to the biological sample itself are best described by its complexity, quantity, and quality.

  • 1.

    Complexity relates to

Sample Preparation

Sample preparation includes RNA harvesting, cDNA synthesis and labeling with fluorescent dyes, and sample hybridization on the microarray. Many array manufacturers recommend protocols for these steps, some of which are specific to the microarray platform (e.g., Affymetrix). In addition, there are many published protocols, for example, Beyhan and Yildiz, 2007, Botwell and Sambrook, 2003, Rhodius and Wade, 2009. Here, we will discuss important issues of sample preparation and present protocols

Microarray Preprocessing

For two-color arrays, each slide is excited at two wavelengths, 532 nm for the Cy3 (green)-labeled reference sample and 650 nm for Cy5 (red)-labeled experiment sample, to measure the fluorescence of the hybridized samples to each probe (feature) on the array. Note that during array scanning it is important to individually adjust the scanning voltages at each wavelength, which in turn controls the detected fluorescent intensities. This is required to (1) approximately balance the signal

Clustering

Clustering is an exploratory data analysis process for datasets containing multiple array experiments. It is used to: discover patterns in the data; group “similar” patterns together either by clustering genes (rows) with similar expression profiles or by clustering arrays/experiments (columns) with similar profiles, or both genes and experiments; reduce the complexity of the data into several distinct patterns; and provide a method to order and organize the data (reviewed in Boutros and Okey,

Differential Expression Analysis

Specialized statistical methods are required to identify significantly differentially expressed genes from microarray data (Allison et al., 2006, Dudoit et al., 2003). Use of these methods is essential to reliably identify genes that are perturbed by expression of parts in a host. Data from a single microarray experiment without further experimental validation is insufficient to reliably identify differentially expressed genes. This is because application of a fold cutoff does not take into

Data Analysis: Understanding the Perturbation

Biological interpretation of candidate gene lists identified through clustering and by significant differential expression is essential in order to identity the biological processes or systems perturbed by expression of parts. Expression of regulators may result in general aberrant ectopic gene expression due to recognition of miscellaneous sites throughout the genome. However, expression of circuit devices may target specific cellular processes that will likely be reflected in the expression

Closing Remarks

The key to successful microarray experiments are careful experiment design that enables the user to capture the direct effects of the introduced perturbation, in this case, the effect of expressing parts within a host. Systematic analysis of multiple parts requires the use of carefully defined growth conditions and control samples to enable cross-comparison of expression data, both within and between laboratories. Equally important is careful data analysis in order to maximize interpretation of

References (73)

  • M. Arifuzzaman et al.

    Large-scale identification of protein-protein interaction of Escherichia coli K-12

    Genome Res.

    (2006)
  • F.M. Ausubel et al.

    Current Protocols in Molecular Biology.

    (1998)
  • P. Baldi et al.

    A Bayesian framework for the analysis of microarray expression data: Regularized t -test and statistical inferences of gene changes

    Bioinformatics

    (2001)
  • C.A. Ball et al.

    Submission of microarray data to public repositories

    PLoS Biol.

    (2004)
  • T. Bammler et al.

    Standardizing global gene expression analysis between laboratories and across platforms

    Nat. Methods

    (2005)
  • C.L. Beisel et al.

    Base pairing small RNAs and their roles in global regulatory networks

    FEMS Microbiol. Rev.

    (2010)
  • Y. Benjamini et al.

    Controlling the false discovery rate—A practical and powerful approach to multiple testing

    J. R. Stat. Soc. B Methodol.

    (1995)
  • S. Beyhan et al.

    Bacterial gene expression analysis using microarrays

    J. Vis. Exp.

    (2007)
  • B.M. Bolstad et al.

    A comparison of normalization methods for high density oligonucleotide array data based on variance and bias

    Bioinformatics

    (2003)
  • D. Botwell et al.

    DNA Microarrays: A Molecular Cloning Manual

    (2003)
  • P.C. Boutros et al.

    Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data

    Brief. Bioinform.

    (2005)
  • W.S. Branham et al.

    Elimination of laboratory ozone leads to a dramatic improvement in the reproducibility of microarray gene expression measurements

    BMC Biotechnol.

    (2007)
  • A. Brazma et al.

    Minimum information about a microarray experiment (MIAME)-toward standards for microarray data

    Nat. Genet.

    (2001)
  • G. Butland et al.

    Interaction network containing conserved and essential protein complexes in Escherichia coli

    Nature

    (2005)
  • B.K. Cho et al.

    The transcription unit architecture of the Escherichia coli genome

    Nat. Biotechnol.

    (2009)
  • S. Datta

    Empirical Bayes screening of many p-values with applications to microarray studies

    Bioinformatics

    (2005)
  • P. D'Haeseleer

    How does gene expression clustering work?

    Nat. Biotechnol.

    (2005)
  • K.A. Do et al.

    A Bayesian mixture model for differential gene expression

    J. R. Stat. Soc. C Appl. Stat.

    (2005)
  • S. Dudoit et al.

    Multiple hypothesis testing in microarray experiments

    Stat. Sci.

    (2003)
  • M.B. Eisen et al.

    Cluster analysis and display of genome-wide expression patterns

    Proc. Natl. Acad. Sci. USA

    (1998)
  • M.B. Elowitz et al.

    A synthetic oscillatory network of transcriptional regulators

    Nature

    (2000)
  • T.L. Fare et al.

    Effects of atmospheric ozone on microarray data quality

    Anal. Chem.

    (2003)
  • M.J. Filiatrault et al.

    Transcriptome analysis of Pseudomonas syringae identifies new genes, noncoding RNAs, and antisense activity

    J. Bacteriol.

    (2010)
  • H. Gao et al.

    Microarray-based analysis of microbial community RNAs by whole-community RNA amplification

    Appl. Environ. Microbiol.

    (2007)
  • D.G. Gibson et al.

    One-step assembly in yeast of 25 overlapping DNA fragments to form a complete synthetic Mycoplasma genitalium genome

    Proc. Natl. Acad. Sci. USA

    (2008)
  • D.G. Gibson et al.

    Enzymatic assembly of DNA molecules up to several hundred kilobases

    Nat. Methods

    (2009)
  • Cited by (7)

    View all citing articles on Scopus
    View full text