Optimising window size of semantic of classification model for identification of in-text citations based on context and intent.

Bibliographic Details
Title: Optimising window size of semantic of classification model for identification of in-text citations based on context and intent.
Authors: Arshad Iqbal, Abdul Shahid, Muhammad Roman, Muhammad Tanvir Afzal, Umair Ul Hassan
Source: PLoS ONE, Vol 20, Iss 3, p e0309862 (2025)
Publisher Information: Public Library of Science (PLoS), 2025.
Publication Year: 2025
Collection: LCC:Medicine
LCC:Science
Subject Terms: Medicine, Science
More Details: Citations in scientific literature act as channels for the sharing, transfer, and development of scientific knowledge. However, not all citations hold the same significance. Numerous taxonomies and machine learning models have been developed to analyze citations, but they often overlook the internal context of these citations. Moreover, it is worth noting that selecting the appropriate word embedding and classification models is crucial for achieving superior results. Word embeddings offer n-dimensional distributed representations of text, striving to capture the nuanced meanings of words. Deep learning-based word embedding techniques have garnered significant attention and found application in various Natural Language Processing (NLP) tasks, including text classification, sentiment analysis, and citation analysis. Current state-of-the-art techniques often use small datasets with fixed window sizes, resulting in the loss of contextual meaning. This study leverages two benchmark datasets encompassing a substantial volume of in-text citations to guide the selection of an optimal word embedding window size and classification approaches. A comparative analysis of various window sizes for in-text citations is conducted to identify crucial citations effectively. Additionally, Word2Vec embedding is employed in conjunction with deep learning models and machine learning models such as Convolutional Neural Networks (CNNs), Gated Recurrent Units (GRUs), Long Short-Term Memory (LSTM) networks, Support Vector Machines (SVM), Decision Trees, and Naive Bayes.The evaluation employs precision, recall, F1-score, and accuracy metrics for each combination of window sizes. The findings reveal that, particularly for lengthy in-text citations, larger citation windows are more adept at capturing the semantic essence of the references. Within the scope of this study, window sizes of 10 achieve superior accuracy and precision with both machine and deep learning models.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 1932-6203
Relation: https://doaj.org/toc/1932-6203
DOI: 10.1371/journal.pone.0309862
Access URL: https://doaj.org/article/77fa9176078f427b8c444b952aec8efe
Accession Number: edsdoj.77fa9176078f427b8c444b952aec8efe
Database: Directory of Open Access Journals
Full text is not displayed to guests.
More Details
ISSN:19326203
DOI:10.1371/journal.pone.0309862
Published in:PLoS ONE
Language:English