PLM-eXplain: Divide and Conquer the Protein Embedding Space

Bibliographic Details
Title:	PLM-eXplain: Divide and Conquer the Protein Embedding Space
Authors:	van Eck, Jan, Gogishvili, Dea, Silva, Wilson, Abeln, Sanne
Publication Year:	2025
Collection:	Computer Science Quantitative Biology
Subject Terms:	Quantitative Biology - Biomolecules, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
More Details:	Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2504.07156
Accession Number:	edsarx.2504.07156
Database:	arXiv

More Details
Description not available.