Prediction of regulation relationship between protein interactions in signaling networks

https://doi.org/10.1016/j.bbrc.2013.09.093Get rights and content

Highlights

  • We propose three methods to predict regulation relations of protein interactions.

  • We detect hundreds of domain pairs useful for the regulation relation prediction.

  • We establish first predicted human interaction network with regulation relations.

Abstract

The discovery of regulation relationship of protein interactions is crucial for the mechanism research in signaling network. Bioinformatics methods can be used to accelerate the discovery of regulation relationship between protein interactions, to distinguish the activation relations from inhibition relations. In this paper, we describe a novel method to predict the regulation relations of protein interactions in the signaling network. We detected 4,417 domain pairs that were significantly enriched in the activation or inhibition dataset. Three machine learning methods, logistic regression, support vector machines(SVMs), and naïve bayes, were explored in the classifier models. The prediction power of three different models was evaluated by 5-fold cross-validation and the independent test dataset. The area under the receiver operating characteristic curve for logistic regression, SVM, and naïve bayes models was 0.946, 0.905 and 0.809, respectively. Finally, the logistic regression classifier was applied to the human proteome-wide interaction dataset, and 2,591 interactions were predicted with their regulation relations, with 2,048 in activation and 543 in inhibition. This model based on domains can be used to identify the regulation relations between protein interactions and furthermore reconstruct signaling pathways.

Introduction

With the development of high-throughput technologies, large-scale protein–protein interaction (PPI) data for multiple species has been produced, which provided the basis for the investigation of protein function and dynamics [1], [2], [3], [4], [5], [6]. An important investigation area is discovering the potential signaling pathways from protein interactions to understand their roles in signal transduction, gene regulation and disease. The typical experimental method to infer the regulation relations between pathway components is perturbing the cells with molecular interventions [7], [8]. It needs many experiments to determine their molecular mechanism and regulation relationships, which is expensive, time-consuming and error-prone.

Several groups have made efforts to develop bioinformatics methods to infer signaling pathways [9], [10], [11], [12], [13], [14]. For example, Steffen, et al. developed a computational approach to generate static models of signal transduction networks from large-scale two-hybrid screens and expression profiles [9]. Silverbush et al. [10] and Gitter et al. [11] presented several algorithms to discover high-confidence pathways. Shlomi et al. presented a comprehensive framework, Qpath, using homologous pathway queries to identify biologically significant pathways and their functions [12]. We have also proposed two methods to predict the directionality in pairwise proteins, based on the domains and functional annotations [13], [14]. These methods can achieve good performance in a part of protein interaction datasets. However, it was still difficult to determine the regulation relationship of protein interactions in the signaling pathways. Giving a pair of interacting proteins, we can predict the direction of signal flow through it using the methods proposed in [13], [14], but we cannot distinguish whether its regulation relation is activation or inhibition. Therefore, it is necessary to develop new bioinformatics methods to predict the regulation relations between protein interactions.

In this paper, we introduced a novel method to predict the regulation relationship between protein interactions in the signaling network according to their constituent domains. Firstly, we proposed a measure, Enrichment_ratio, to identify the domain pairs significantly enriched in the activation/inhibition dataset. Then, we trained the classifiers based on three machine learning methods (logistic regression, SVM and naïve bayes) with the activation dataset and the inhibition dataset. Furthermore, we evaluated these methods based on 5-fold cross-validation and the independent test dataset. Finally, we applied the logistic regression method to predict the regulation relations in the human proteome-wide interactions.

Section snippets

Extraction of signaling networks in multiple species

As a classical and well-known pathway database, KEGG (Kyoto Encyclopedia of Genes and Genomes) contains manually annotated pathways based on biochemical evidence from the literature, including a large amount of signaling and metabolic pathways [15]. All the signaling networks of human, mouse, rat, fly and yeast were downloaded from KEGG. From these signaling networks, 1,893 protein interactions are extracted with their regulation relationship, including 1,554 in the category of activation and

Extraction of domain pairs enriched in activation/inhibition dataset

Domains are elements of proteins in a sense of structure and function. Most proteins interact with each other through their domains. Therefore, it is crucial and useful to understand PPIs based on the domains [18]. In Fig. 1, we gave an example to demonstrate the domain pairs contained in the protein interactions. Protein A contains three domains D1, D2 and D3, and Protein B contains two domains E1 and E2. In principle, the domains contained in Protein A and the domains contained in Protein B

Discussion and conclusions

Regulation relationship is one of the most important features of the protein interactions in signaling networks. The determination of the regulation relations of protein interactions is crucial to reveal potential signaling pathways and construct signaling network. Reconstruction of signaling networks from protein interactions might be applied to understanding signaling transduction process, complex drug actions, and dysfunctional signaling in diseased cells [21].

In this paper, we proposed

Acknowledgments

We thank Drs. Jiyang Zhang, Tengjiao Wang and Changming Xu for their excellent advice and assistance as well as all the members in the Bioinformatics Laboratory, College of Mechanical & Electronic Engineering and Automatization, National University of Defense Technology for helpful discussions. This work was supported by the National Natural Science Foundation of China (Grant Nos. 31000591, 31000587, and 31171266).

References (21)

  • W. Liu et al.

    Proteome-wide prediction of signal flow direction in protein interaction networks based on interacting domains

    Mol. Cell. Proteomics

    (2009)
  • U. Stelzl et al.

    A human protein–protein interaction network: a resource for annotating the proteome

    Cell

    (2005)
  • H.W. Mewes et al.

    MIPS: a database for genomes and protein sequences

    Nucleic Acids Res.

    (2002)
  • U. Peter et al.

    A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae

    Nature

    (2000)
  • T. Ito et al.

    A comprehensive two-hybrid analysis to explore the yeast protein interactome

    Proc. Natl. Acad. Sci. USA

    (2001)
  • S. Li et al.

    A map of the interactome network of the metazoan C. elegans

    Science

    (2003)
  • L. Giot et al.

    A protein interaction map of Drosophila melanogaster

    Science

    (2003)
  • S. Peri et al.

    Development of human protein reference database as an initial platform for approaching systems biology in humans

    Genome Res.

    (2003)
  • K. Sachs et al.

    Causal protein-signaling networks derived from multiparameter single-cell data

    Science

    (2005)
  • D. Silverbush et al.

    Optimally orienting physical networks

    J. Comput. Biol.

    (2011)
There are more references available in the full text version of this article.
View full text