In:
Frontiers in Cell and Developmental Biology, Frontiers Media SA, Vol. 8 ( 2020-12-1)
Abstract:
Cysteine S-sulphenylation (CSO), as a novel post-translational modification (PTM), has emerged as a potential mechanism to regulate protein functions and affect signal networks. Because of its functional significance, several prediction approaches have been developed. Nevertheless, they are based on a limited dataset from Homo sapiens and there is a lack of prediction tools for the CSO sites of other species. Recently, this modification has been investigated at the proteomics scale for a few species and the number of identified CSO sites has significantly increased. Thus, it is essential to explore the characteristics of this modification across different species and construct prediction models with better performances based on the enlarged dataset. In this study, we constructed several classifiers and found that the long short-term memory model with the word-embedding encoding approach, dubbed LSTM WE , performs favorably to the traditional machine-learning models and other deep-learning models across different species, in terms of cross-validation and independent test. The area under the receiver operating characteristic (ROC) curve for LSTM WE ranged from 0.82 to 0.85 for different organisms, which was superior to the reported CSO predictors. Moreover, we developed the general model based on the integrated data from different species and it showed great universality and effectiveness. We provided the on-line prediction service called DeepCSO that included both species-specific and general models, which is accessible through http://www.bioinfogo.org/DeepCSO .
Type of Medium:
Online Resource
ISSN:
2296-634X
DOI:
10.3389/fcell.2020.594587
DOI:
10.3389/fcell.2020.594587.s001
DOI:
10.3389/fcell.2020.594587.s002
Language:
Unknown
Publisher:
Frontiers Media SA
Publication Date:
2020
detail.hit.zdb_id:
2737824-X