Abstract
Progress in experimental procedures has led to rapid availability of Omics profiles. Various open-access as well as commercial tools have been developed for storage, analysis, and interpretation of transcriptomics, proteomics, and metabolomics data. Generally, major analysis steps include data storage, retrieval, preprocessing, and normalization, followed by identification of differentially expressed features, functional annotation on the level of biological processes and molecular pathways, as well as interpretation of gene lists in the context of protein–protein interaction networks. In this chapter, we discuss a sequential transcriptomics data analysis workflow utilizing open-source tools, specifically exemplified on a gene expression dataset on familial hypercholesterolemia.
Key words
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wittner, B. S., Sgroi, D. C., Ryan, P. D., Bruinsma, T. J., Glas, A. M., Male, A., Dahiya, S., Habin, K., Bernards, R., Haber, D. A., Van’t Veer, L. J., and Ramaswamy, S. (2008) Analysis of the MammaPrint breast cancer assay in a predominantly postmenopausal cohort. Clin Cancer Res 14, 2988–93.
Perco, P., Rapberger, R., Siehs, C., Lukas, A., Oberbauer, R., Mayer, G., and Mayer, B. (2006) Transforming omics data into context: bioinformatics on genomics and proteomics raw data. Electrophoresis 27, 2659–75.
Parkinson, H., Kapushesky, M., Shojatalab, M., Abeygunawardena, N., Coulson, R., Farne, A., Holloway, E., Kolesnykov, N., Lilja, P., Lukk, M., Mani, R., Rayner, T., Sharma, A., William, E., Sarkans, U., and Brazma, A. (2007) ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 35, D747–50.
Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I. F., Soboleva, A., Tomashevsky, M., Marshall, K. A., Phillippy, K. H., Sherman, P. M., Muertter, R. N., and Edgar, R. (2009) NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37, D885–90.
Demeter, J., Beauheim, C., Gollub, J., Hernandez-Boussard, T., Jin, H., Maier, D., Matese, J. C., Nitzberg, M., Wymore, F., Zachariah, Z. K., Brown, P. O., Sherlock, G., and Ball, C. A. (2007) The Stanford Microarray Database: implementation of new analysis tools and open source release of software. Nucleic Acids Res 35, D766–70.
Hoogland, C., Mostaguir, K., Sanchez, J. C., Hochstrasser, D. F., and Appel, R. D. (2004) SWISS-2DPAGE, ten years later. Proteomics 4, 2352–6.
Smolka, M., Zhou, H., and Aebersold, R. (2002) Quantitative protein profiling using two-dimensional gel electrophoresis, isotope-coded affinity tag labeling, and mass spectrometry. Mol Cell Proteomics 1, 19–29.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim, I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29, 365–71.
Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–64.
Affymetrix (2001) Statistical algorithms reference guide, Technical Report. Technical Report, Affymetrix.
Schadt, E. E., Li, C., Ellis, B., and Wong, W. H. (2001) Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J Cell Biochem Suppl Suppl 37, 120–5.
Li, C., and Wong, W. H. (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2, RESEARCH0032.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–5.
Zhou, X., Wang, X., and Dougherty, E. R. (2003) Missing-value estimation using linear and non-linear regression with Bayesian gene selection. Bioinformatics 19, 2302–7.
Bo, T. H., Dysvik, B., and Jonassen, I. (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32, e34.
Jornsten, R., Wang, H. Y., Welsh, W. J., and Ouyang, M. (2005) DNA microarray data imputation and significance analysis of differential expression. Bioinformatics 21, 4155–61.
Nie, L., Wu, G., and Zhang, W. (2008) Statistical application and challenges in global gel-free proteomic analysis by mass spectrometry. Crit Rev Biotechnol 28, 297–307.
Grosse-Coosmann, F., Boehm, A. M., and Sickmann, A. (2005) Efficient analysis and extraction of MS/MS result data from Mascot result files. BMC Bioinformatics 6, 290.
Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J. C., Hernandez-Boussard, T., Rees, C. A., Cherry, J. M., Botstein, D., Brown, P. O., and Alizadeh, A. A. (2003) SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res 31, 219–23.
Safran, M., Chalifa-Caspi, V., Shmueli, O., Olender, T., Lapidot, M., Rosen, N., Shmoish, M., Peter, Y., Glusman, G., Feldmesser, E., Adato, A., Peter, I., Khen, M., Atarot, T., Groner, Y., and Lancet, D. (2003) Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31, 142–6.
Westfall, P. H., and Young, S. S. (1993) in Wiley series in probability and mathematical statistics. Wiley, New York.
Dudoit, S., Shaffer, J. P., and Boldrick, J. C. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 19, 1090–9.
Ge, Y., Dudoit, S., and Speed, T. P. (2003) Resampling-based multiple testing for microarray data analysis. TEST 12, 1–44.
van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004) Multiple testing. Part II. Step-down procedures for control of the family-wise error rate. Stat Appl Genet Mol Biol 3, Article14.
Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F., Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J. Y., and Zhang, J. (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80.
Efron, B., and Tibshirani, R. J. (1993) An introduction to the bootstrap. Chapman and Hall, New York.
Tusher, V. G., Tibshirani, R., and Chu, G. (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98, 5116–21.
Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E., Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., and Quackenbush, J. (2003) TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374–8.
Khatri, P., and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–95.
Huang da, W., Sherman, B. T., and Lempicki, R. A. (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4, 44–57.
Kanehisa, M., Goto, S., Kawashima, S., and Nakaya, A. (2002) The KEGG databases at GenomeNet. Nucleic Acids Res 30, 42–6.
Mi, H., Lazareva-Ulitsky, B., Loo, R., Kejariwal, A., Vandergriff, J., Rabkin, S., Guo, N., Muruganujan, A., Doremieux, O., Campbell, M. J., Kitano, H., and Thomas, P. D. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res 33, D284–8.
Joshi-Tope, G., Gillespie, M., Vastrik, I., D’Eustachio, P., Schmidt, E., de Bono, B., Jassal, B., Gopinath, G. R., Wu, G. R., Matthews, L., Lewis, S., Birney, E., and Stein, L. (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33, D428–32.
Antonov, A. V., Dietmann, S., and Mewes, H. W. (2008) KEGG spider: interpretation of genomics data in the context of the global gene metabolic network. Genome Biol 9, R179.
Portales-Casamar, E., Thongjuea, S., Kwon, A. T., Arenillas, D., Zhao, X., Valen, E., Yusuf, D., Lenhard, B., Wasserman, W. W., and Sandelin, A. (2010) JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res 38, D105–10.
Ho Sui, S. J., Mortimer, J. R., Arenillas, D. J., Brumm, J., Walsh, C. J., Kennedy, B. P., and Wasserman, W. W. (2005) oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes. Nucleic Acids Res 33, 3154–64.
von Mering, C., Jensen, L. J., Kuhn, M., Chaffron, S., Doerks, T., Kruger, B., Snel, B., and Bork, P. (2007) STRING 7 – recent developments in the integration and prediction of protein interactions. Nucleic Acids Res 35, D358–62.
Jensen, L. J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., and von Mering, C. (2009) STRING 8 – a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37, D412–6.
Alexeyenko, A., and Sonnhammer, E. L. (2009) Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res 19, 1107–16.
Bernthaler, A., Muhlberger, I., Fechete, R., Perco, P., Lukas, A., and Mayer, B. (2009) A dependency graph approach for the analysis of differential gene expression profiles. Mol Biosyst 5, 1720–31.
Kersey, P. J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E., and Apweiler, R. (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985–8.
Mosig, S., Rennert, K., Buttner, P., Krause, S., Lutjohann, D., Soufi, M., Heller, R., and Funke, H. (2008) Monocytes of patients with familial hypercholesterolemia show alterations in cholesterol metabolism. BMC Med Genomics 1, 60.
Rainer, J., Sanchez-Cabo, F., Stocker, G., Sturn, A., and Trajanoski, Z. (2006) CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis. Nucleic Acids Res 34, W498–503.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Mühlberger, I., Wilflingseder, J., Bernthaler, A., Fechete, R., Lukas, A., Perco, P. (2011). Computational Analysis Workflows for Omics Data Interpretation. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_17
Download citation
DOI: https://doi.org/10.1007/978-1-61779-027-0_17
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-026-3
Online ISBN: 978-1-61779-027-0
eBook Packages: Springer Protocols