Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
Type of Medium
Language
  • 1
    Online Resource
    Online Resource
    Springer Science and Business Media LLC ; 2022
    In:  Human Genetics Vol. 141, No. 10 ( 2022-10), p. 1629-1647
    In: Human Genetics, Springer Science and Business Media LLC, Vol. 141, No. 10 ( 2022-10), p. 1629-1647
    Abstract: The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient—MCC—for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.
    Type of Medium: Online Resource
    ISSN: 0340-6717 , 1432-1203
    RVK:
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2022
    detail.hit.zdb_id: 1459188-1
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Elsevier BV ; 2007
    In:  Methods Vol. 41, No. 4 ( 2007-04), p. 460-474
    In: Methods, Elsevier BV, Vol. 41, No. 4 ( 2007-04), p. 460-474
    Type of Medium: Online Resource
    ISSN: 1046-2023
    Language: English
    Publisher: Elsevier BV
    Publication Date: 2007
    detail.hit.zdb_id: 1471152-7
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Springer Science and Business Media LLC ; 2022
    In:  BMC Bioinformatics Vol. 23, No. 1 ( 2022-08-08)
    In: BMC Bioinformatics, Springer Science and Business Media LLC, Vol. 23, No. 1 ( 2022-08-08)
    Abstract: Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4–5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions. Results Here, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94 ± 8% of the beta barrel TMPs (53 of 57) and 98 ± 1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060). Conclusions Based on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.
    Type of Medium: Online Resource
    ISSN: 1471-2105
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2022
    detail.hit.zdb_id: 2041484-5
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Springer Science and Business Media LLC ; 2012
    In:  BMC Genomics Vol. 13, No. Suppl 4 ( 2012), p. S11-
    In: BMC Genomics, Springer Science and Business Media LLC, Vol. 13, No. Suppl 4 ( 2012), p. S11-
    Type of Medium: Online Resource
    ISSN: 1471-2164
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2012
    detail.hit.zdb_id: 2041499-7
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 5
    In: Genome Biology, Springer Science and Business Media LLC, Vol. 20, No. 1 ( 2019-12)
    Abstract: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster , which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster , it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
    Type of Medium: Online Resource
    ISSN: 1474-760X
    Language: English
    Publisher: Springer Science and Business Media LLC
    Publication Date: 2019
    detail.hit.zdb_id: 2040529-7
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 6
    Online Resource
    Online Resource
    American Association for the Advancement of Science (AAAS) ; 2014
    In:  Science Vol. 346, No. 6207 ( 2014-10-17), p. 355-359
    In: Science, American Association for the Advancement of Science (AAAS), Vol. 346, No. 6207 ( 2014-10-17), p. 355-359
    Abstract: Human bestrophin 1 (hBest1) is a membrane protein that forms a chloride channel in the retinal pigment epithelium. Mutations in hBest1 can lead to a retinal degeneration disease known as Best disease. Yang et al. describe the structure of KpBest, a bacterial homolog of hBest1. KpBest forms a pentamer with an ion channel at its center. In contrast to hBest1, KpBest1 is a sodium channel. The structure suggests a mechanism for ion selectivity that was confirmed by mutagenesis of KpBest and hBest1. A model of the hBest1 channel structure based on the KpBest structure reveals how mutations cause disease. Science , this issue p. 355
    Type of Medium: Online Resource
    ISSN: 0036-8075 , 1095-9203
    RVK:
    RVK:
    Language: English
    Publisher: American Association for the Advancement of Science (AAAS)
    Publication Date: 2014
    detail.hit.zdb_id: 128410-1
    detail.hit.zdb_id: 2066996-3
    detail.hit.zdb_id: 2060783-0
    SSG: 11
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 7
    In: Protein Science, Wiley, Vol. 32, No. 1 ( 2023-01)
    Abstract: The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha‐helical and beta‐barrel transmembrane segments; signal‐peptides; variant effect) in seconds. The structure prediction provided by LambdaPP —leveraging ColabFold and computed in minutes —is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5 . Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org , the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2 . The frontend of LambdaPP can be found on GitHub ( github.com/sacdallago/embed.predictprotein.org ), and can be freely used and distributed under the academic free use license (AFL‐2). For high‐throughput applications, all methods can be executed locally via the bio‐embeddings ( bioembeddings.com ) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings , which also includes the backend of LambdaPP.
    Type of Medium: Online Resource
    ISSN: 0961-8368 , 1469-896X
    URL: Issue
    RVK:
    Language: English
    Publisher: Wiley
    Publication Date: 2023
    detail.hit.zdb_id: 2000025-X
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 8
    Online Resource
    Online Resource
    Frontiers Media SA ; 2022
    In:  Frontiers in Bioinformatics Vol. 2 ( 2022-11-17)
    In: Frontiers in Bioinformatics, Frontiers Media SA, Vol. 2 ( 2022-11-17)
    Abstract: Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as “homology detection”) use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of & lt;20% pairwise sequence identity, PIDE). While this approach excelled for proteins with single domains, we demonstrated the current challenges applying this to multi-domain proteins and presented some ideas how to overcome existing limitations, in principle. We observed that sufficiently challenging data set separations were crucial to provide deeply relevant insights into the behavior of nearest neighbor search when applied to the protein embedding space, and made all our methods readily available for others.
    Type of Medium: Online Resource
    ISSN: 2673-7647
    Language: Unknown
    Publisher: Frontiers Media SA
    Publication Date: 2022
    detail.hit.zdb_id: 3091287-8
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 9
    Online Resource
    Online Resource
    Oxford University Press (OUP) ; 2007
    In:  Bioinformatics Vol. 23, No. 13 ( 2007-07-01), p. i347-i353
    In: Bioinformatics, Oxford University Press (OUP), Vol. 23, No. 13 ( 2007-07-01), p. i347-i353
    Abstract: Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. Results: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein–DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. Availability: http://cubic.bioc.columbia.edu/services/disis Contact: yo135@columbia.edu
    Type of Medium: Online Resource
    ISSN: 1367-4811 , 1367-4803
    Language: English
    Publisher: Oxford University Press (OUP)
    Publication Date: 2007
    detail.hit.zdb_id: 1468345-3
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
  • 10
    In: Bioinformatics, Oxford University Press (OUP), Vol. 17, No. 12 ( 2001-12-01), p. 1242-1243
    Abstract: Summary: Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. Availability: http://cubic.bioc.columbia.edu/eva. Contact: eva@cubic.bioc.columbia.edu * To whom correspondence should be addressed.
    Type of Medium: Online Resource
    ISSN: 1367-4811 , 1367-4803
    Language: English
    Publisher: Oxford University Press (OUP)
    Publication Date: 2001
    detail.hit.zdb_id: 1468345-3
    SSG: 12
    Library Location Call Number Volume/Issue/Year Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. Further information can be found on the KOBV privacy pages