Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Diversified Control Paths: A Significant Way Disease Genes Perturb the Human Regulatory Network

  • Bingbo Wang ,

    w_bingbo@163.com (BBW); lgao@mail.xidian.edu.cn (LG)

    Affiliation School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China

  • Lin Gao ,

    w_bingbo@163.com (BBW); lgao@mail.xidian.edu.cn (LG)

    Affiliation School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China

  • Qingfang Zhang,

    Affiliation School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China

  • Aimin Li,

    Affiliation School of Computer Science and Technology, Xi’an University of Technology, Xi'an, People’s Republic of China

  • Yue Deng,

    Affiliations School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China, Institute of Software Engineering, Xidian University, Xi'an, People’s Republic of China

  • Xingli Guo

    Affiliation School of Computer Science and Technology, Xidian University, Xi'an, People’s Republic of China

Abstract

Background

The complexity of biological systems motivates us to use the underlying networks to provide deep understanding of disease etiology and the human diseases are viewed as perturbations of dynamic properties of networks. Control theory that deals with dynamic systems has been successfully used to capture systems-level knowledge in large amount of quantitative biological interactions. But from the perspective of system control, the ways by which multiple genetic factors jointly perturb a disease phenotype still remain.

Results

In this work, we combine tools from control theory and network science to address the diversified control paths in complex networks. Then the ways by which the disease genes perturb biological systems are identified and quantified by the control paths in a human regulatory network. Furthermore, as an application, prioritization of candidate genes is presented by use of control path analysis and gene ontology annotation for definition of similarities. We use leave-one-out cross-validation to evaluate the ability of finding the gene-disease relationship. Results have shown compatible performance with previous sophisticated works, especially in directed systems.

Conclusions

Our results inspire a deeper understanding of molecular mechanisms that drive pathological processes. Diversified control paths offer a basis for integrated intervention techniques which will ultimately lead to the development of novel therapeutic strategies.

Introduction

Network medicine [1] deals with complexity by simplifying cellular systems, summarizing them merely as biomolecular networks which are graphs with components (nodes) and interactions (edges) between them. There are different types of biomolecular networks such as genetic regulatory networks [2, 3], biochemical reaction networks [4, 5] and protein-protein interaction networks [6, 7] represent the functional, biochemical and physical interactions that can be identified with a plethora of technologies [8]. Network-based approaches to human disease take a complex disease stems as the malfunctions of corresponding biomolecular networks [9, 10]. Therefore, one of important tasks is to identify the effects of cellular interconnectedness on disease progression.

Recently, control theoretical tools of complex network have become a topic of active pursuit [1120] and been successfully used to analyze biomolecular networks. Many dynamic properties of complex disease, mediated by the underlying cellular network, can be learned from the control effects exerted by genetic factors or drugs [2024]. In particular, Liu et al. [11] introduce a maximum matching approach to predict minimum driver set (MDSet) nodes for the control of various biological networks. Additionally, Liu et al. [20] elucidate the principles behind biochemical network observability by offering the essential sensors in cell communication or biomarker design. Rajapakse et al. [21] examine various aspects of a genomic state-dependent dynamic network and elaborate on the controllability of genomic networks during processes of genomic reorganization. Wuchty [22] shows that MDSets of proteins are more likely to be essential, cancer-related and virus-targeted genes and closely related to bottleneck interactions, regulatory and phosphorylation functions, and genetic interactions. Melissa et al. [23] assess the output controllability of protein glycosylation in Chinese Hamster Ovary Cell for addressing the problem of glycosylation heterogeneity. Dealing with dynamic systems that respond to external inputs with specific output signals, these works have successfully achieved the important functional characteristics of MDSets nodes for the control of complex biological networks. Whereas they only focus on the control roles of nodes, an intriguing question, however, remains what exactly the control paths by which genetic factors perturb biological networks look like.

Therefore, we wondered whether control paths that are related with pathogenesis of the complex disease from the perspective of network medicine as well. Starting from an MDSet genetic factors whose time dependent control can guide the whole biological network to any desired final state, input control signals transmit along directed paths to all other genetic factors. These directed paths are called control paths. The dynamical process of propagating the perturbation influence relies on these control paths. We expected that the control paths carry biological significance, for example, disease-related pathway.

In this work, the control paths are defined on the maximum matching set (MMSet) edges which are a stem-cycle disjoint cover of the network and show us the directed paths along which the input control signals are transmitted. Moreover, diversified MMSets bring us diversified control paths (DCpaths) in which a node participates. For this node, the downstream reachable set nodes in these DCpaths are used to index its perturbation influence in this network. Focusing on the currently best investigated interactomes we determined the genes’ DCpaths in a human regulatory network. The known disease genes’ perturbation ranges were indeed enriched with disease-related pathways. Furthermore, DCpaths are used to analyse gene-phenotype relationship data from the Online Mendelian Inheritance in Man (OMIM) [25] and to test, by the leave-one-out cross-validation, the application in prioritizing candidates for all diseases with at least two known disease genes. The case studies of Alzheimer Disease, Diabetes Mellitus Type 2 and Leukemia strongly suggest that such well-defined DCpaths have significance in the identification of novel causal genes and disease pathways.

Results

Diversified control paths

According to Kalman’s controllability rank condition [26], a linear time-invariant dynamic system is controllable, if and only if the the n × nm controllability matrix QC has full rank, i.e., (1) where the state vector , is the adjacency matrix, is the input matrix, is the input vector, m is the number of driver nodes and n is the number of nodes. The underlying directed network of this system is denoted by G(A), with node set V and link set L. But, it is computationally infeasible for complex networks to verify Kalman’s condition. To overcome this difficulty, Liu et al. [11] proposed the concept of maximum-matching set (MMSet) to assess and quantify structural controllability of arbitrary complex networks. A particularly useful result is the number of MDSet nodes (ND) required to fully control a network G(A) is max{n − |M|,1}. An MMSet is a link set ML with maximum cardinality, and no two links in M may share a common starting node or a common ending node. |M| denotes the size of MMSet.

The controllability of a complex network concentrates on the interaction structure in which the pattern of influence may be known, but not the specific extent of influence [18]. In response to unknown or uncertain link weights, the controllability is used to uncover the generic properties of systems, independent of parameter values [27]. An MMSet shows the important links by which we can construct the cactus structures efficiently in a complex system [11]. The cactus must be the most economical topology-structure pattern to propagate control influence, since the cactus is a minimal structure such that removing any link will render the structure uncontrollable [28]. Therefore, we should recognize that the MMSet not only reveals the MDSet but also consists of a backbone of the key control paths. It forms a stem-cycle cover of the original network. Starting from the MDSet nodes input control signals transmit along the directed paths which are constructed by the MMSet links to guide the whole network to any desired final state. These directed paths are called control paths.

Definition 1.

Control Path Set (CPSet) Ck is composed of the control paths which are connected by the links of a maximum-matching set Mk in a complex network.

For example, a system with adjacency matrix A and input matrix B in Fig 1A, from the Kalman’s controllability matrix QC shown in Fig 1B, we can see the following important structural information for global controllability:

  1. If a41 = 0, rank(QC) < n. Node 4 must only be influenced by node 1.
  2. If a32 = 0, rank(QC) < n. When node 4 is controlled by node 1, node 3 must be influenced by node 2.
  3. If a53 = 0, rank(QC) < n. Node 5 must be influenced by the state of node 3.
  4. If a31 = 0, rank(QC) = n can be true. Without the influence coming from node 1, node 3 and node 5 can also be controlled.

thumbnail
Fig 1. A schematic diagram of diversified control paths.

(a): a linear dynamic system with adjacency matrix A and input matrix B. (b): the Kalman’s controllability rank condition, if rank(QC) = n, this system is controllable. (c): for the underlying network, propose a maximum-matching set (MMSet) to assess the structural controllability. Links of the MMSet are highlighted by red. (d): for a network G(A), differentiated MMSet M1 and M2, marked by red, construct diversified control path sets. Disease genes are marked by purple and their perturbation ranges are indicated by shadow areas.

https://doi.org/10.1371/journal.pone.0135491.g001

Maximum matching is usually used to solve the assignment problem. Then we can also take the maximum matching as an assignment scheme of control influences in a complex network. Control influences are assigned according to an MMSet marked by red colour in Fig 1C, and the original network is divided into two control paths, through which all nodes can be controlled by the MDSet nodes marked by green colour. Losing some intricate control information between nodes inevitably, the MMSet absolutely retains all the four structural properties listed above and shows us the CPSet to govern the whole network. Therefore, we take the control paths as the significant pathways, implying critical topological information, which are related to the dynamical process of propagating the perturbation influence.

Furthermore, we note that, for a network there are diversified MMSets. Each one brings to our eyes a unique CPSet through which control influences transmit. The approach to enumerate diversified MMSets is given below. As showed in Fig 1D, for the network G(A), differentiated MMSet M1 and M2, marked by red colour, can shoulder the same control responsibilities and form diversified CPSets. The perturbation range (Pr) of a given node i under a MMSet Mk is a node set indicated as (2) k = 1,2…K, K is the number of existing MMSets. Links in Mk invariably connect the nodes of Pri (Ck) into a cactus structure originating from node i. Lin’s theorem [28] has demonstrated that a linear control system is structurally controllable if and only if the associated digraph can be spanned by cacti. So the states of nodes in Pri (Ck) can be fully controlled by influencing node i. Two shadow areas in Fig 1D have displayed the perturbation ranges of two disease genes highlighted by purple.

Definition 2.

Set is the diversified control paths (DCpaths) of a complex network.

Then the perturbation influence (Pi) of a given node i can be indicated as (3)

What we exactly want to do is use the perturbation influence, based on the control paths, to identify and quantify the ways by which disease genes perturb biological systems.

Perturbation influence of disease genes

Firstly, we focus on how the known disease genes intervene in a biological system. The DCpaths of a human regulatory network (Table A in S1 File) is detected to reveal the perturbation influences of disease genes. Intuitively, for a disease, the overlap of disease genes’ perturbation influences can be taken as the significant pathways, which are related to its etiology essentially.

In Fig 2, the Tuberculosis (MIM:107470) in OMIM has 2 disease gene IFNGR1 and IFNG, which are also characterized by the partial regulatory network. Fig 2A and 2B show us two differentiated CPSets (highlighted with red) and the perturbation ranges of IFNGR1 and IFNG (circled with red and blue dotted lines respectively). Diversified control paths indicate the perturbation influences of IFNGR1 and IFNG (marked by red and blue shadow respectively) in Fig 2C. Their overlapped gene set {CDK4, CSDA, CKS1B, SKP2, CDKN1B} is considered as the potential pathways which have close relationships with pathogenesis of the Tuberculosis. All the five genes participate in the small cell lung cancer pathway (hsa05222 in KEGG [29]).

thumbnail
Fig 2. The perturbation influences of disease genes of the Tuberculosis (MIM:107470).

(a): A CPSet of the partial human regulatory network, the MMSet links are highlighted by red, the disease gene IFNGR1 and IFNG are marked by purple, their perturbation ranges are circled with red and blue dotted lines respectively. (b): Another differentiated CPSet. (c): the perturbation influences of IFNGR1 and IFNG are marked by red and blue shadow respectively.

https://doi.org/10.1371/journal.pone.0135491.g002

Also, we proceed to execute the DCpaths analysis on Thrombocythemia (MIM: 187950) and Immunodeficiency (MIM: 610163). The results are given in Table 1 with the disease name, disease gene list, genes’ perturbation influences and their common Gene Ontology (GO) terms [30]. For Thrombocythemia, the three known disease genes TPO, JAK2 and MPL can thoroughly perturb some common ranges which have the same biological functions, such as JAK-STAT cascade, growth hormone receptor signaling pathway, cytokine-mediated signaling pathway, etc. For Immunodeficiency, its known disease genes CD3E and CD3G almost have the same perturbation influences on the regulatory network, which chiefly affect the immune response of human.

thumbnail
Table 1. Instances for the perturbation influence of disease gene.

https://doi.org/10.1371/journal.pone.0135491.t001

It is clear that the DCpaths can be used to index the ways by which the disease genes influence pathological processes. And for the same disease, known disease genes’ perturbation influences are same and indeed enriched with disease-related pathways. To further demonstrate the power of DCpaths, we prioritize the candidate genes based on the assumption that the genes cause the same disease by driving the same perturbation influence in the human regulatory network.

Prioritization of candidate genes

With investigation of the relative location of the candidate to all of the known disease genes by the use of perturbation influence, we assign a score to each of the candidate genes: (4) where, Xd is the set of known disease genes of disease d, for any given disease genes xXd, the biological functional similarity between the perturbation influence Pii and Pix is calculated. The maximum value of similarities is taken as the score of a gene i for disease d. The details of how the similarity is obtained are given below. Then the genes are ranked according to the score in order to define a priority list of candidates for further biological investigation.

In Table 2, Alzheimer Disease (MIM:104300), Breast cancer (MIM:114480) and Colorectal cancer (MIM: 114500) are proceeded for studies. Unsurprisingly, our method assigned the high ranks to the known disease genes in all cases. What is more, we detect some potential causal genes from the top ranked candidate genes as showed in Table 3. Most of them have been proved to be closely related to the corresponding dieases’ pathogenesis by the existing literatures. For instance, Kanekiyo et al. [31] have demonstrated that the low-density lipoprotein receptor-related protein 1 (LRP1) plays a critical role in brain amyloid-β (Aβ) peptides clearance and the Accumulation, aggregation, and deposition of Aβ are likely initiating events in the pathogenesis of Alzheimer’s disease (AD); Protein precursor cleaving enzyme 1 (BACE1) is the first protease and the rate limiting enzyme in the genesis of amyloid-β. This protein remains an important potential disease-modifying target for the development of drugs to treat AD [32]; Protein kinase C-alpha (PRCK1) regulates MDR1 expression with siRNA and reverse chemoresistance of ovarian cancer [33]; Epidermal growth factor (EGF) receptor is frequently overexpressed in the malignant phenotype of ovarian cancer leading to increased cell proliferation and survival [34]; AQP7 is a glycerol channel in adipose tissue with a suggested role in controlling the accumulation of triglycerides and secondly development of obesity and type-2 diabetes [35]; PIM-2 is a proto-oncogene and highly expressed in neoplastic tissues and in leukemic and lymphoma cell lines, the nuclear factor kappa B (NFKB1) pathway appears to be deregulated in a variety of tumors, with sustained activity of NFKB1 leading to apoptotic resistance in tumor cells [36].

thumbnail
Table 2. The ranks of known disease genes for three instances.

https://doi.org/10.1371/journal.pone.0135491.t002

thumbnail
Table 3. The top ranked candidate genes for some instances.

https://doi.org/10.1371/journal.pone.0135491.t003

We test our DCpaths-based method by Leave-One-Out Cross Validation (LOO-CV) [37]. Removing one disease-gene association in each cross validation trial, if this association can be ranked within top k% over the entire human regulatory network, it can be said that the association is reconstructed successfully. We evaluated prioritization results in terms of overall recall when varying the rank threshold k%. In Fig 3, the comparison with the sophisticated method PRINCE [38], obtained by prioritizing candidates on all 112 diseases in the LOO-CV, shows that our method achieve compatible prediction outcomes with PRINCE and further illustrate the disease genes perturb the biological system by the DCpaths we mentioned.

thumbnail
Fig 3. Comparison in performance between our method and PRINCE.

A plot of recall versus rank threshold, rank threshold k% means that the gene was ranked within top k%.

https://doi.org/10.1371/journal.pone.0135491.g003

Case study

Furthermore, to further demonstrate the significance of perturbation influences, we examine whether forecasted causal genes are enriched with disease pathways on multifactorial disorders or not. Alzheimer Disease (MIM:104300), Diabetes Mellitus, Type 2 (MIM:125853) and Leukemia (MIM:601626) are selected for case studies. We take the top 30 ranked candidate genes for these cases as the causal genes and check the metabolic pathways they participate in by GeneTrail [39]. Typical output of GeneTrail is a set of metabolic pathway terms with the size of the query and the term gene lists, their overlap gene lists and the statistical significance (p-value) of such enrichment.

Alzheimer Disease (MIM:104300) in OMIM gives a list of 6 known disease genes, which are also characterized by the human regulatory network. Besides these genes, the functional enrichment of other 24 causal genes within the top 30 are analysed in Table 4. We can see that 8 of them are involved in hsa04610: Complementand coagulation cascades (p-value = 4.69e-08), that 5 of them are involved in hsa05010: Alzheimer's disease (p-value = 6.04e-04), etc. Almost all the pathways are closely related with the current knowledge on Alzheimer Disease.

thumbnail
Table 4. Enrichment analysis of causal genes in Alzheimer Disease.

https://doi.org/10.1371/journal.pone.0135491.t004

Diabetes Mellitus, Type 2 (MIM:125853) in OMIM gives a list of 12 known disease genes, which are also characterized by the human regulatory network. Besides these genes, the functional enrichment of other 18 causal genes within the top 30 are analysed in Table 5. We can see that 8 of them are involved in hsa04930: Type II diabetes mellitus (p-value = 3.75e-09), that 7 of them are involved in hsa04960: Aldosterone-regulated sodium reabsorption (p-value = 5.43e-09), that 11 of them are involved in hsa04910: Insulin signaling pathway (p-value = 1.12e-08), etc. These agree well with the current knowledge on Diabetes Mellitus.

thumbnail
Table 5. Enrichment analysis of causal genes in Diabetes Mellitus, Type 2.

https://doi.org/10.1371/journal.pone.0135491.t005

Leukemia (MIM:601626) in OMIM gives a list of 12 known disease genes, which are also characterized by the human regulatory network. Besides these genes, the functional enrichment of other 18 causal genes within the top 30 are analysed in Table 6. We can see that 8 of them are involved in hsa05221: Acute myeloid leukemia (p-value = 1.51e-07), that 7 of them are involved in hsa04662: B cell receptor signaling pathway (p-value = 3.41e-06), etc. These agree well with the current knowledge on Leukemia.

thumbnail
Table 6. Enrichment analysis of causal genes in Leukemia.

https://doi.org/10.1371/journal.pone.0135491.t006

Robustness

To show the robustness of DCpaths, we test our method by prioritization of oncogenes, stability of perturbation influence of genes, and case study on Breast cancer (MIM: 114480) in a cancer signaling map (Table B in S1 File) [40]. The cancer signaling map was constructed by using cancer mutations and the literature-curated human signaling network. Characterizing an overall picture of the cancer, this network contains 326 nodes, 892 edges, in which 259 edges are indirected and we convert them into bi-directional edges. Then we extract the cancer signaling map with 326 nodes, 1151 edges to reveal the signaling architecture of cancer. 30 known cancer-gene associations are selected from the OMIM knowledge database.

For prioritization of oncogenes, the 2-fold, 5 fold and 10-fold cross validation results have been provided in Fig 4A, 4B and 4C. Comparisons with PRINCE [38] on AUC [41] show that our method achieves better prediction outcomes than PRINCE. These results also embody the advantages of DCpaths in directed relationship analysis.

thumbnail
Fig 4. Analysis results for robustness of our method.

(a) The AUC of 2-fold cross validation for prioritization of oncogenes in the cancer signaling map. (b) The AUC of 5-fold cross validation for prioritization of oncogenes in the cancer signaling map. (c) The AUC of 10-fold cross validation for prioritization of oncogenes in the cancer signaling map. (d) The stability of perturbation influence of the regulatory network. (e) The stability of perturbation influence of the cancer signaling map.

https://doi.org/10.1371/journal.pone.0135491.g004

For stability of perturbation influence of genes, we remove a certain proportion of edges and assign a score to the stability of perturbation influence (SPi): (5) where, Pii′ indicates the perturbation influence of a given node i after the a certain proportion of edges are removed. We take the Jaccard coefficient between the perturbation influences of node before and after deleting to measure the stability. Then the average of stabilities of all nodes is used to show the stability of perturbation influence of a network. In Fig 4D and 4E the SPi of the regulatory network and cancer signaling map are shown respectively, 20 times randomized experiments are conducted for each proportion. With the increase of the percentage of removed edges, the SPi goes down. But, on average, a node can maintain almost 70% original perturbation influence after 10% edges are removed in both the regulatory network and cancer signaling map. An MMSet experiences strong influence of removing, even the scale of MMSet will be changed. We use a random process to obtain variant MMSets as many as possible (see the next section). Based on diversified MMSets, DCpaths eliminates influences of changeful individual MMSet and Pi show preferable stability under removing.

As a case study, we present the results of prioritization candidate genes on Breast cancer (MIM:114480) in the cancer signaling map. Breast cancer in OMIM gives a list of 5 known disease genes (APC, ATM, p53, PI3K and CDH1), which are further characterized by the cancer signaling map. Besides these genes, some potential causal genes from the top ranked candidate genes as showed in Table 7. Furthermore, we have downloaded the somatic mutations for Breast cancer (Table C in S1 File) from TCGA [42]. Most all of potential causal genes in Table 7 mutate in more than 2 samples. Having obvious mutation, they tend to play an important role in Breast cancer. Fig 5 vividly display the perturbation influence of disease genes by showing a subnetwork of the cancer signaling map. This subnetwork contains 5 known disease genes (highlighted by red color), 15 top ranked potential causal genes (highlighted by magenta color) and the genes (highlighted by purple color) in the diversified control paths of disease genes. These directed paths identify the ways by which 5 disease genes perturb the cancer signaling map. And the 15 potential causal genes also have ability to perturb these pathways, for they possess their own directed control paths lead to the perturbation influence of disease genes.

thumbnail
Fig 5. Sketch of prediction results for Breast cancer in Cancer Signaling Map.

5 known disease genes are highlighted by red color, 15 top ranked potential causal genes are highlighted by magenta color and the genes in the diversified control paths of disease genes are highlighted by purple color.

https://doi.org/10.1371/journal.pone.0135491.g005

thumbnail
Table 7. The top ranked candidate genes for Breast cancer in Cancer Signaling Map.

https://doi.org/10.1371/journal.pone.0135491.t007

Materials and Methods

The DCpaths-based approach requires a directed network as input. In this study, we consider a human KEGG regulatory network (Table A in S1 File), constructed by Backes et al. [3]. This network contains the regulatory relationships selected from all KEGG pathways and can be download from the website http://genetrail.bioinf.uni-sb.de/ilp/Home.html. Backes et al. access the data via the Biochemical Network Database (BNDB) [43] for a consistent interface. It contains 2010 genes connected by 9900 regulatory relationships, among which 1579 genes, annotated by GO terms, with 7630 regulatory relationships are selected to form our human regulatory network. The GO annotation is essential to the calculation of biological functional similarity between the perturbation influences. Our Leave-One-Out Cross Validation process needs one disease has at least 2 known disease genes. Therefore, 366 known disease-gene associations, satisfied this condition, are chosen from the OMIM knowledge database, relating 252 known disease genes to 112 diseases.

Diversified MMSets enumeration

For a given directed network, anyone of the existing algorithms [44, 45] can be used to compute an MMSet. The Markov process, as described by Jia et al [46], performs unbiased random sampling among all MMSets and can be used to estimate the role of each vertex in controlling the network. We used the approach of Wang et al [47] to enumerate diversified MMSets. Beginning from an MMSet, randomly chooses a link in this MMSet, enumerates all alternative MMSets that include all other elements except this link, then randomly chooses one of these MMSets as the current MMSet and repeats the process. The DCpaths of our results is achieved for 827 diversified MMSets in 18649 random samples for the prioritization of candidate genes in the human regulatory network.

Functional similarity between the perturbation influences

Then we present the detailed description of the algorithm to calculate the biological functional similarity between the perturbation influence Pii and Pix of given gene i and x.

Algorithm for sim(Pii, Pix):

Input the gene set Pii and Pix

Construct a bipartite graph BP(Pii, Pix, E), ∀u ∈ Pii, ∀l ∈ Pix, E(u, l) = GOSimBMA (u, l) [48]

Solve the Maximum Weight Bipartite Matching Problem on BP by the Hungarian algorithm [41].

Output the sum of the weights of the maximum matching as sim(Pii, Pix).

The perturbation influences of gene i and x are the gene set Pii and Pix, we detect the best matching between these two gene sets, and use the GO annotation similarity to quantify the functional similarity of each matching pair (u, l). The similarity of two genes, indicated as GOSimBMA (u, l), is computed by the method of Wang et al. [48]. The more genes in Pii having consistent functions with the genes in Pix, the higher value sim(Pii, Pix) achieves.

Conclusions

Medium-scale subnetworks, such as motif and community, represent the functional structures of a complex system. MMSet decomposes a network into medium-scale structures (stems and cycles), which are the subsistent control paths, by which we can control the whole system with the minimum driver nodes effectively. Therefore, detecting the significant DCpaths to quantify the perturbation influences of the genetic factors in the biological system is our goal. Although the influences of genetic factors are complicated and confused, DCpaths is an effective mean to analyze the intricate control relationship between them. To verify the power of DCpaths, we have handled the prioritization of candidate genes in the human regulatory network to analyze the perturbations of known disease genes, predict causal genes and detect disease pathways.

Using DCpaths to analyze pathogenesis is due to its several considerable merits: DCpaths give us a chance to understand the complex disease form a new perspective that how and to which extent does a genetic factor influences the network; DCpaths’ calculation has nothing to do with the weights of the regulatory relationships; DCpaths-based method can reveal very important functional relationships between genetic factors, which can not be detect by common neighbour or reachable path based methods, especially in directed biological systems. For instance in Fig 2, the disease genes IFNGR1 and IFNG of Tuberculosis have no common neighbours or reachable paths to each other, but their same influence on the dynamic properties of this disease can be uncovered based on DCpaths in the human regulatory network.

Supporting Information

S1 File. The experimental data.

The human KEGG regulatory network (Table A), the cancer signaling map (Table B) and the somatic mutations for Breast cancer from TCGA (Table C).

https://doi.org/10.1371/journal.pone.0135491.s001

(XLS)

Acknowledgments

We are grateful to the reviewers and the editor for their useful comments and suggestions which improved our method. We thank Dr. Backes et al. for the original human KEGG regulatory network data, and also thank Dr. Xiaofei Yang for reviewing the manuscript.

Author Contributions

Conceived and designed the experiments: BBW LG. Performed the experiments: QFZ AL. Analyzed the data: BBW YD XLG. Wrote the paper: BBW LG.

References

  1. 1. Barabási A-L, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12: 56–68. pmid:21164525
  2. 2. Consortium TF, Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, et al. The Transcriptional Landscape of the Mammalian Genome. Science. 2005;309: 1559–1563. pmid:16141072
  3. 3. Backes C, Rurainski A, Klau GW, Müller O, Stöckel D, Gerasch A, et al. An integer linear programming approach for finding deregulated subgraphs in regulatory networks. Nucl Acids Res. 2012;40: e43–e43. pmid:22210863
  4. 4. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabási A-L. The large-scale organization of metabolic networks. Nature. 2000;407: 651–654. pmid:11034217
  5. 5. Agren R, Bordel S, Mardinoglu A, Pornputtapong N, Nookaew I, Nielsen J. Reconstruction of Genome-Scale Active Metabolic Networks for 69 Human Cell Types and 16 Cancer Types Using INIT. PLoS Comput Biol. 2012;8: e1002518. pmid:22615553
  6. 6. Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature. 2005;437: 1173–1178. pmid:16189514
  7. 7. Rolland T, Taşan M, Charloteaux B, Pevzner SJ, Zhong Q, Sahni N, et al. A Proteome-Scale Map of the Human Interactome Network. Cell. 2014;159: 1212–1226. pmid:25416956
  8. 8. Venkatesan K, Rual J-F, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, et al. An empirical framework for binary interactome mapping. Nat Meth. 2009;6: 83–90.
  9. 9. Vidal M, Cusick ME, Barabási A-L. Interactome Networks and Human Disease. Cell. 2011;144: 986–998. pmid:21414488
  10. 10. Zhou X, Menche J, Barabási A-L, Sharma A. Human symptoms–disease network. Nat Commun. 2014;5.
  11. 11. Liu Y-Y, Slotine J-J, Barabási A-L. Controllability of complex networks. Nature. 2011;473: 167–173. pmid:21562557
  12. 12. Liu Y-Y, Slotine J-J, Barabási A-L. Control Centrality and Hierarchical Structure in Complex Networks. PLoS ONE. 2012;7: e44459. pmid:23028542
  13. 13. Wang B, Gao L, Gao Y. Control range: a controllability-based index for node significance in directed networks. J Stat Mech. 2012;2012: P04011.
  14. 14. Yuan Z, Zhao C, Di Z, Wang W-X, Lai Y-C. Exact controllability of complex networks. Nat Commun. 2013;4.
  15. 15. Cornelius SP, Kath WL, Motter AE. Realistic control of network dynamics. Nat Commun. 2013;4.
  16. 16. Sun J, Motter AE. Controllability Transition and Nonlocality in Network Control. Phys Rev Lett. 2013;110: 208701. pmid:25167459
  17. 17. Wang B, Gao L, Gao Y, Deng Y. Maintain the structural controllability under malicious attacks on directed networks. EPL. 2013;101: 58003.
  18. 18. Ruths J, Ruths D. Control Profiles of Complex Networks. Science. 2014;343: 1373–1376. pmid:24653036
  19. 19. Gao J, Liu Y-Y, D’Souza RM, Barabási A-L. Target control of complex networks. Nat Commun. 2014;5.
  20. 20. Liu Y-Y, Slotine J-J, Barabási A-L. Observability of complex systems. PNAS. 2013;110: 2460–2465. pmid:23359701
  21. 21. Rajapakse I, Groudine M, Mesbahi M. Dynamics and control of state-dependent networks for probing genomic organization. PNAS. 2011;108: 17257–17262. pmid:21911407
  22. 22. Wuchty S. Controllability in protein interaction networks. PNAS. 2014;111: 7156–7160. pmid:24778220
  23. 23. St. Amand MM, Tran K, Radhakrishnan D, Robinson AS, Ogunnaike BA. Controllability Analysis of Protein Glycosylation in Cho Cells. PLoS One. 2014;9.
  24. 24. Asgari Y, Salehzadeh-Yazdi A, Schreiber F, Masoudi-Nejad A. Controllability in Cancer Metabolic Networks According to Drug Targets as Driver Nodes. PLoS ONE. 2013;8: e79397. pmid:24282504
  25. 25. Amberger J, Bocchini CA, Scott AF, Hamosh A. McKusick’s Online Mendelian Inheritance in Man (OMIM®). Nucl Acids Res. 2009;37: D793–D796. pmid:18842627
  26. 26. Kalman R. Mathematical Description of Linear Dynamical Systems. Journal of the Society for Industrial and Applied Mathematics Series A Control. 1963;1: 152–192.
  27. 27. Poljak S. On the generic dimension of controllable subspaces. IEEE Transactions on Automatic Control. 1990;35: 367–369.
  28. 28. Lin C-T. Structural controllability. IEEE Transactions on Automatic Control. 1974;19: 201–208.
  29. 29. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucl Acids Res. 2000;28: 27–30. pmid:10592173
  30. 30. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25: 25–29. pmid:10802651
  31. 31. Kanekiyo T, Cirrito JR, Liu C-C, Shinohara M, Li J, Schuler DR, et al. Neuronal Clearance of Amyloid-β by Endocytic Receptor LRP1. J Neurosci. 2013;33: 19276–19283. pmid:24305823
  32. 32. Cheng X, Zhou Y, Gu W, Wu J, Nie A, Cheng J, et al. The Selective BACE1 Inhibitor VIa Reduces Amyloid-β Production in Cell and Mouse Models of Alzheimer's Disease. Journal of Alzheimer's Disease. 2013;37: 823–834. pmid:23948917
  33. 33. Zhao L-J, Xu H, Qu J-W, Zhao W-Z, Zhao Y-B, Wang J-H. Modulation of Drug Resistance in Ovarian Cancer Cells by Inhibition of Protein Kinase C-alpha (PKC-α) with Small Interference RNA (siRNA) Agents. Asian Pacific Journal of Cancer Prevention. 2012;13: 3631–3636. pmid:23098446
  34. 34. Cho M, Kabir SM, Dong Y, Lee E, Rice VM, Khabele D, et al. Aspirin Blocks EGF-stimulated Cell Viability in a COX-1 Dependent Manner in Ovarian Cancer Cells. J Cancer. 2013;4: 671–678. pmid:24155779
  35. 35. Lebeck J, Østergård T, Rojek A, Füchtbauer E-M, Lund S, Nielsen S, et al. Gender-specific effect of physical training on AQP7 protein expression in human adipose tissue. Acta Diabetol. 2012;49: 215–226. pmid:23001483
  36. 36. Kapelko-Słowik K, Urbaniak-Kujda D, Wołowiec D, Jaźwiec B, Dybko J, Jakubaszko J, et al. Increased expression of PIM-2 and NF-κB genes in patients with acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) is associated with complete remission rate and overall survival. Advances in Hygiene & Experimental Medicine / Postepy Higieny i Medycyny Doswiadczalnej. 2013;67: 553–559.
  37. 37. Kohavi R. A Study of Cross-validation and Bootstrap for Accuracy Estimation and Model Selection. In Ijcai. 1995, August. Vol. 14, No. 2, pp. 1137–1145.
  38. 38. Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating Genes and Protein Complexes with Disease via Network Propagation. PLoS Comput Biol. 2010;6: e1000641. pmid:20090828
  39. 39. Keller A, Backes C, Al-Awadhi M, Gerasch A, Küntzer J, Kohlbacher O, et al. GeneTrailExpress: a web-based pipeline for the statistical evaluation of microarray experiments. BMC Bioinformatics. 2008;9: 552. pmid:19099609
  40. 40. Cui Q, Ma Y, Jaramillo M, Bari H, Awan A, Yang S, et al. A map of human cancer signaling. Molecular Systems Biology. 2007;3.
  41. 41. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27: 861–874.
  42. 42. Leiserson MDM, Blokh D, Sharan R, Raphael BJ. Simultaneous Identification of Multiple Driver Pathways in Cancer. PLoS Comput Biol. 2013;9: e1003054. pmid:23717195
  43. 43. Küntzer J, Backes C, Blum T, Gerasch A, Kaufmann M, Kohlbacher O, et al. BNDB–The Biochemical Network Database. BMC Bioinformatics. 2007;8: 367. pmid:17910766
  44. 44. Kuhn HW. The Hungarian method for the assignment problem. Naval Research Logistics. 2005;52: 7–21.
  45. 45. Hopcroft J, Karp R. An n^5/2 Algorithm for Maximum Matchings in Bipartite Graphs. SIAM J Comput. 1973;2: 225–231.
  46. 46. Jia T, Barabási A-L. Control Capacity and A Random Sampling Method in Exploring Controllability of Complex Networks. Sci Rep. 2013;3.
  47. 47. Wang B, Gao L, Gao Y, Deng Y, Wang Y. Controllability and observability analysis for vertex domination centrality in directed networks. Sci Rep. 2014;4.
  48. 48. Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F. A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007;23: 1274–1281. pmid:17344234