Bibliographic Details
Title: |
PLM-eXplain: Divide and Conquer the Protein Embedding Space |
Authors: |
van Eck, Jan, Gogishvili, Dea, Silva, Wilson, Abeln, Sanne |
Publication Year: |
2025 |
Collection: |
Computer Science Quantitative Biology |
Subject Terms: |
Quantitative Biology - Biomolecules, Computer Science - Artificial Intelligence, Computer Science - Machine Learning |
More Details: |
Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights. |
Document Type: |
Working Paper |
Access URL: |
http://arxiv.org/abs/2504.07156 |
Accession Number: |
edsarx.2504.07156 |
Database: |
arXiv |