From gene modules to gene markers: an integrated AI-human approach selects CD38 to represent plasma cell-associated transcriptional signatures

Bibliographic Details
Title: From gene modules to gene markers: an integrated AI-human approach selects CD38 to represent plasma cell-associated transcriptional signatures
Authors: Basirudeen Syed Ahamed Kabeer, Bishesh Subba, Darawan Rinchai, Mohammed Toufiq, Taushif Khan, Marina Yurieva, Damien Chaussabel
Source: Frontiers in Medicine, Vol 12 (2025)
Publisher Information: Frontiers Media S.A., 2025.
Publication Year: 2025
Collection: LCC:Medicine (General)
Subject Terms: large language models, gene prioritization, targeted transcriptional profiling, plasma cell responses, CD38, biomarker discovery, Medicine (General), R5-920
More Details: BackgroundKnowledge-driven prioritization of candidate genes derived from large-scale molecular profiling data for targeted transcriptional profiling assays is challenging due to the vast amount of biomedical literature that needs to be harnessed. We present a workflow leveraging Large Language Models (LLMs) to prioritize candidate genes within module M12.15, a plasma cell-associated module from the BloodGen3 repertoire, by integrating knowledge-driven prioritization with data-driven analysis of transcriptome profiles.MethodsThe workflow involves a two-step process: (1) high-throughput screening using LLMs to score and rank the 17 genes of module M12.15 based on six predefined criteria, and (2) prioritization employing high-resolution scoring and fact-checking, with human experts validating and refining AI-generated scores.ResultsThe first step identified five candidate genes (CD38, TNFRSF17, IGJ, TOP2A, and TYMS). Following human-augmented LLM scoring and fact checking, as part of the second step, CD38 and TNFRSF17 emerged as the top candidates. Next, transcriptome profiling data from three datasets was incorporated in the workflow to assess expression levels and correlations with the module average across various conditions and cell types. It is on this basis that CD38 was prioritized as the top candidate, with TNFRSF17 and IGJ identified as promising alternatives.ConclusionThis study introduces a systematic framework that integrates LLMs with human expertise for gene prioritization. Our analysis identified CD38, TNFRSF17, and IGJ as the top candidates within the plasma cell-associated module M12.15 from the BloodGen3 repertoire, with their relative rankings varying systematically based on specific evaluation criteria, from plasma cell biology to therapeutic relevance. This criterion-dependent ranking demonstrates the ability of the framework to perform nuanced, multi-faceted evaluations. By combining knowledge-driven analysis with data-driven metrics, our approach provides a balanced and comprehensive method for biomarker selection. The methodology established here offers a reproducible and scalable approach that can be applied across diverse biological contexts and extended to analyze large module repertoires.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2296-858X
Relation: https://www.frontiersin.org/articles/10.3389/fmed.2025.1510431/full; https://doaj.org/toc/2296-858X
DOI: 10.3389/fmed.2025.1510431
Access URL: https://doaj.org/article/a7406f25ea894224b0f97032f424edb0
Accession Number: edsdoj.7406f25ea894224b0f97032f424edb0
Database: Directory of Open Access Journals
More Details
ISSN:2296858X
DOI:10.3389/fmed.2025.1510431
Published in:Frontiers in Medicine
Language:English