Bibliographic Details
Title: |
From gene modules to gene markers: an integrated AI-human approach selects CD38 to represent plasma cell-associated transcriptional signatures |
Authors: |
Basirudeen Syed Ahamed Kabeer, Bishesh Subba, Darawan Rinchai, Mohammed Toufiq, Taushif Khan, Marina Yurieva, Damien Chaussabel |
Source: |
Frontiers in Medicine, Vol 12 (2025) |
Publisher Information: |
Frontiers Media S.A., 2025. |
Publication Year: |
2025 |
Collection: |
LCC:Medicine (General) |
Subject Terms: |
large language models, gene prioritization, targeted transcriptional profiling, plasma cell responses, CD38, biomarker discovery, Medicine (General), R5-920 |
More Details: |
BackgroundKnowledge-driven prioritization of candidate genes derived from large-scale molecular profiling data for targeted transcriptional profiling assays is challenging due to the vast amount of biomedical literature that needs to be harnessed. We present a workflow leveraging Large Language Models (LLMs) to prioritize candidate genes within module M12.15, a plasma cell-associated module from the BloodGen3 repertoire, by integrating knowledge-driven prioritization with data-driven analysis of transcriptome profiles.MethodsThe workflow involves a two-step process: (1) high-throughput screening using LLMs to score and rank the 17 genes of module M12.15 based on six predefined criteria, and (2) prioritization employing high-resolution scoring and fact-checking, with human experts validating and refining AI-generated scores.ResultsThe first step identified five candidate genes (CD38, TNFRSF17, IGJ, TOP2A, and TYMS). Following human-augmented LLM scoring and fact checking, as part of the second step, CD38 and TNFRSF17 emerged as the top candidates. Next, transcriptome profiling data from three datasets was incorporated in the workflow to assess expression levels and correlations with the module average across various conditions and cell types. It is on this basis that CD38 was prioritized as the top candidate, with TNFRSF17 and IGJ identified as promising alternatives.ConclusionThis study introduces a systematic framework that integrates LLMs with human expertise for gene prioritization. Our analysis identified CD38, TNFRSF17, and IGJ as the top candidates within the plasma cell-associated module M12.15 from the BloodGen3 repertoire, with their relative rankings varying systematically based on specific evaluation criteria, from plasma cell biology to therapeutic relevance. This criterion-dependent ranking demonstrates the ability of the framework to perform nuanced, multi-faceted evaluations. By combining knowledge-driven analysis with data-driven metrics, our approach provides a balanced and comprehensive method for biomarker selection. The methodology established here offers a reproducible and scalable approach that can be applied across diverse biological contexts and extended to analyze large module repertoires. |
Document Type: |
article |
File Description: |
electronic resource |
Language: |
English |
ISSN: |
2296-858X |
Relation: |
https://www.frontiersin.org/articles/10.3389/fmed.2025.1510431/full; https://doaj.org/toc/2296-858X |
DOI: |
10.3389/fmed.2025.1510431 |
Access URL: |
https://doaj.org/article/a7406f25ea894224b0f97032f424edb0 |
Accession Number: |
edsdoj.7406f25ea894224b0f97032f424edb0 |
Database: |
Directory of Open Access Journals |