KOBV Portal

Hits per page

hits 1 - 10 | 292 hits

Sorting

Online Resource

Embeddings from protein language models predict conservation and variant effects

Marquet, Céline ; Heinzinger, Michael ; Olenyi, Tobias ; [et al.]

Springer Science and Business Media LLC ; 2022

In: Human Genetics Vol. 141, No. 10 ( 2022-10), p. 1629-1647

add to watchlist on the watchlist

Details

In: Human Genetics, Springer Science and Business Media LLC, Vol. 141, No. 10 ( 2022-10), p. 1629-1647

Abstract: The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient—MCC—for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.

Type of Medium: Online Resource

ISSN: 0340-6717 , 1432-1203

URL: Article

DOI: 10.1007/s00439-021-02411-y

RVK:

WA 15000

Language: English

Publisher: Springer Science and Business Media LLC

Publication Date: 2022

detail.hit.zdb_id: 1459188-1

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

Membrane protein prediction methods

Punta, Marco ; Forrest, Lucy R. ; Bigelow, Henry ; [et al.]

Elsevier BV ; 2007

In: Methods Vol. 41, No. 4 ( 2007-04), p. 460-474

add to watchlist on the watchlist

Details

In: Methods, Elsevier BV, Vol. 41, No. 4 ( 2007-04), p. 460-474

Type of Medium: Online Resource

ISSN: 1046-2023

URL: Article

DOI: 10.1016/j.ymeth.2006.07.026

Language: English

Publisher: Elsevier BV

Publication Date: 2007

detail.hit.zdb_id: 1471152-7

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

TMbed: transmembrane proteins predicted through language model embeddings

Bernhofer, Michael ; Rost, Burkhard

Springer Science and Business Media LLC ; 2022

In: BMC Bioinformatics Vol. 23, No. 1 ( 2022-08-08)

add to watchlist on the watchlist

Details

In: BMC Bioinformatics, Springer Science and Business Media LLC, Vol. 23, No. 1 ( 2022-08-08)

Abstract: Despite the immense importance of transmembrane proteins (TMP) for molecular biology and medicine, experimental 3D structures for TMPs remain about 4–5 times underrepresented compared to non-TMPs. Today’s top methods such as AlphaFold2 accurately predict 3D structures for many TMPs, but annotating transmembrane regions remains a limiting step for proteome-wide predictions. Results Here, we present TMbed, a novel method inputting embeddings from protein Language Models (pLMs, here ProtT5), to predict for each residue one of four classes: transmembrane helix (TMH), transmembrane strand (TMB), signal peptide, or other. TMbed completes predictions for entire proteomes within hours on a single consumer-grade desktop machine at performance levels similar or better than methods, which are using evolutionary information from multiple sequence alignments (MSAs) of protein families. On the per-protein level, TMbed correctly identified 94 ± 8% of the beta barrel TMPs (53 of 57) and 98 ± 1% of the alpha helical TMPs (557 of 571) in a non-redundant data set, at false positive rates well below 1% (erred on 30 of 5654 non-membrane proteins). On the per-segment level, TMbed correctly placed, on average, 9 of 10 transmembrane segments within five residues of the experimental observation. Our method can handle sequences of up to 4200 residues on standard graphics cards used in desktop PCs (e.g., NVIDIA GeForce RTX 3060). Conclusions Based on embeddings from pLMs and two novel filters (Gaussian and Viterbi), TMbed predicts alpha helical and beta barrel TMPs at least as accurately as any other method but at lower false positive rates. Given the few false positives and its outstanding speed, TMbed might be ideal to sieve through millions of 3D structures soon to be predicted, e.g., by AlphaFold2.

Type of Medium: Online Resource

ISSN: 1471-2105

URL: Article

DOI: 10.1186/s12859-022-04873-x

Language: English

Publisher: Springer Science and Business Media LLC

Publication Date: 2022

detail.hit.zdb_id: 2041484-5

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

Schaefer, Christian ; Bromberg, Yana ; Achten, Dominik ; [et al.]

Springer Science and Business Media LLC ; 2012

In: BMC Genomics Vol. 13, No. Suppl 4 ( 2012), p. S11-

add to watchlist on the watchlist

Details

In: BMC Genomics, Springer Science and Business Media LLC, Vol. 13, No. Suppl 4 ( 2012), p. S11-

Type of Medium: Online Resource

ISSN: 1471-2164

URL: Article

DOI: 10.1186/1471-2164-13-S4-S11

Language: English

Publisher: Springer Science and Business Media LLC

Publication Date: 2012

detail.hit.zdb_id: 2041499-7

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Zhou, Naihui ; Jiang, Yuxiang ; Bergquist, Timothy R. ; [et al.]

Springer Science and Business Media LLC ; 2019

In: Genome Biology Vol. 20, No. 1 ( 2019-12)

add to watchlist on the watchlist

Details

In: Genome Biology, Springer Science and Business Media LLC, Vol. 20, No. 1 ( 2019-12)

Abstract: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster , which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster , it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Type of Medium: Online Resource

ISSN: 1474-760X

URL: Article

DOI: 10.1186/s13059-019-1835-8

Language: English

Publisher: Springer Science and Business Media LLC

Publication Date: 2019

detail.hit.zdb_id: 2040529-7

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

Structure and selectivity in bestrophin ion channels

Yang, Tingting ; Liu, Qun ; Kloss, Brian ; [et al.]

American Association for the Advancement of Science (AAAS) ; 2014

In: Science Vol. 346, No. 6207 ( 2014-10-17), p. 355-359

add to watchlist on the watchlist

Details

In: Science, American Association for the Advancement of Science (AAAS), Vol. 346, No. 6207 ( 2014-10-17), p. 355-359

Abstract: Human bestrophin 1 (hBest1) is a membrane protein that forms a chloride channel in the retinal pigment epithelium. Mutations in hBest1 can lead to a retinal degeneration disease known as Best disease. Yang et al. describe the structure of KpBest, a bacterial homolog of hBest1. KpBest forms a pentamer with an ion channel at its center. In contrast to hBest1, KpBest1 is a sodium channel. The structure suggests a mechanism for ion selectivity that was confirmed by mutagenesis of KpBest and hBest1. A model of the hBest1 channel structure based on the KpBest structure reveals how mutations cause disease. Science , this issue p. 355

Type of Medium: Online Resource

ISSN: 0036-8075 , 1095-9203

URL: Article

DOI: 10.1126/science.1259723

RVK:

TA 1000

RVK:

WA 15000

Language: English

Publisher: American Association for the Advancement of Science (AAAS)

Publication Date: 2014

detail.hit.zdb_id: 128410-1

detail.hit.zdb_id: 2066996-3

detail.hit.zdb_id: 2060783-0

SSG: 11

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

LambdaPP : Fast and accessible protein‐specific phenotype predictions

Olenyi, Tobias ; Marquet, Céline ; Heinzinger, Michael ; [et al.]

Wiley ; 2023

In: Protein Science Vol. 32, No. 1 ( 2023-01)

add to watchlist on the watchlist

Details

In: Protein Science, Wiley, Vol. 32, No. 1 ( 2023-01)

Abstract: The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha‐helical and beta‐barrel transmembrane segments; signal‐peptides; variant effect) in seconds. The structure prediction provided by LambdaPP —leveraging ColabFold and computed in minutes —is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5 . Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org , the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2 . The frontend of LambdaPP can be found on GitHub ( github.com/sacdallago/embed.predictprotein.org ), and can be freely used and distributed under the academic free use license (AFL‐2). For high‐throughput applications, all methods can be executed locally via the bio‐embeddings ( bioembeddings.com ) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings , which also includes the backend of LambdaPP.

Type of Medium: Online Resource

ISSN: 0961-8368 , 1469-896X

URL: Issue

URL: Article

DOI: 10.1002/pro.v32.1

DOI: 10.1002/pro.4524

RVK:

WA 15000

Language: English

Publisher: Wiley

Publication Date: 2023

detail.hit.zdb_id: 2000025-X

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

DataSheet2.PDF

Schütze, Konstantin ; Heinzinger, Michael ; Steinegger, Martin ; [et al.]

Frontiers Media SA ; 2022

In: Frontiers in Bioinformatics Vol. 2 ( 2022-11-17)

add to watchlist on the watchlist

Details

In: Frontiers in Bioinformatics, Frontiers Media SA, Vol. 2 ( 2022-11-17)

Abstract: Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as “homology detection”) use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of & lt;20% pairwise sequence identity, PIDE). While this approach excelled for proteins with single domains, we demonstrated the current challenges applying this to multi-domain proteins and presented some ideas how to overcome existing limitations, in principle. We observed that sufficiently challenging data set separations were crucial to provide deeply relevant insights into the behavior of nearest neighbor search when applied to the protein embedding space, and made all our methods readily available for others.

Type of Medium: Online Resource

ISSN: 2673-7647

URL: Article

DOI: 10.3389/fbinf.2022.1033775

DOI: 10.3389/fbinf.2022.1033775.s001

Language: Unknown

Publisher: Frontiers Media SA

Publication Date: 2022

detail.hit.zdb_id: 3091287-8

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

Prediction of DNA-binding residues from sequence

Ofran, Yanay ; Mysore, Venkatesh ; Rost, Burkhard

Oxford University Press (OUP) ; 2007

In: Bioinformatics Vol. 23, No. 13 ( 2007-07-01), p. i347-i353

add to watchlist on the watchlist

Details

In: Bioinformatics, Oxford University Press (OUP), Vol. 23, No. 13 ( 2007-07-01), p. i347-i353

Abstract: Motivation: Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. Results: It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein–DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. Availability: http://cubic.bioc.columbia.edu/services/disis Contact: yo135@columbia.edu

Type of Medium: Online Resource

ISSN: 1367-4811 , 1367-4803

URL: Article

DOI: 10.1093/bioinformatics/btm174

Language: English

Publisher: Oxford University Press (OUP)

Publication Date: 2007

detail.hit.zdb_id: 1468345-3

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

Online Resource

EVA: continuous automatic evaluation of protein structure prediction servers

Eyrich, Volker A. ; Martı́-Renom, Marc A. ; Przybylski, Dariusz ; [et al.]

Oxford University Press (OUP) ; 2001

In: Bioinformatics Vol. 17, No. 12 ( 2001-12-01), p. 1242-1243

add to watchlist on the watchlist

Details

In: Bioinformatics, Oxford University Press (OUP), Vol. 17, No. 12 ( 2001-12-01), p. 1242-1243

Abstract: Summary: Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. Availability: http://cubic.bioc.columbia.edu/eva. Contact: eva@cubic.bioc.columbia.edu * To whom correspondence should be addressed.

Type of Medium: Online Resource

ISSN: 1367-4811 , 1367-4803

URL: Article

DOI: 10.1093/bioinformatics/17.12.1242

Language: English

Publisher: Oxford University Press (OUP)

Publication Date: 2001

detail.hit.zdb_id: 1468345-3

SSG: 12

Bookmarklink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

Online Resource

Open Access Request

Availability (electronic / print)

Link to publisher

hits 1 - 10 | 292 hits

Nothing or not found what you are looking for? Please check your search query or use the Interlibrary Loan Search.

Kooperativer Bibliotheksverbund

Berlin Brandenburg