GSCtool: A Novel Descriptor that Characterizes the Genome for Applying Machine Learning in Genomics

Bibliographic Details
Title: GSCtool: A Novel Descriptor that Characterizes the Genome for Applying Machine Learning in Genomics
Authors: Zijie Shen, Enhui Shen, Qian-Hao Zhu, Longjiang Fan, Quan Zou, Chu-Yu Ye
Source: Advanced Intelligent Systems, Vol 5, Iss 12, Pp n/a-n/a (2023)
Publisher Information: Wiley, 2023.
Publication Year: 2023
Collection: LCC:Computer engineering. Computer hardware
Subject Terms: genome-to-phenotype (G2P), genomic descriptor, genomic machine learning, supervised learning, variety protection, Computer engineering. Computer hardware, TK7885-7895, Control engineering systems. Automatic machinery (General), TJ212-225
More Details: Machine learning (ML) is one of the core driving forces for the next breeding stage, and Breeding 4.0. Genotype matrix based on single‐nucleotide polymorphisms (SNPs) is often used in ML for genome‐to‐phenotype prediction. Genotype matrix has an inherent defect, as the feature spaces it generates across different individuals or groups are inconsistent, and this hinders the application of ML. To overcome the challenge, a genome descriptor, Genic SNPs Composition Tool (GSCtool) is developed, which counts the number of SNPs in each gene of the genome so the dimension of the feature vectors equals the number of annotated genes in a species. Compared to using the genotype matrix, using GSCtool significantly decreases the model training time and has a higher accuracy of phenotype prediction. GSCtool also achieves good performance in variety identification, which is useful in crop variety protection. In general, GSCtool will help facilitate the application and study of genomic ML. The source code and test data of GSCtool are freely available at https://github.com/SZJhacker/GSCtool and https://gitee.com/shenzijie/GSCtool.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2640-4567
Relation: https://doaj.org/toc/2640-4567
DOI: 10.1002/aisy.202300426
Access URL: https://doaj.org/article/833d710b2de44a22ad0b79ff6e27be02
Accession Number: edsdoj.833d710b2de44a22ad0b79ff6e27be02
Database: Directory of Open Access Journals
More Details
ISSN:26404567
DOI:10.1002/aisy.202300426
Published in:Advanced Intelligent Systems
Language:English