Urban and rural disparities in stroke prediction using machine learning among Chinese older adults

Bibliographic Details
Title: Urban and rural disparities in stroke prediction using machine learning among Chinese older adults
Authors: Jingjing Zhu, Luotao Lin, Lei Si, Hailei Zhao, Hualing Song, Xianglong Xu
Source: Scientific Reports, Vol 15, Iss 1, Pp 1-9 (2025)
Publisher Information: Nature Portfolio, 2025.
Publication Year: 2025
Collection: LCC:Medicine
LCC:Science
Subject Terms: Stroke, Prediction, Machine learning, Urban and rural disparities, Middle-aged and elderly adults, Medicine, Science
More Details: Abstract Stroke is a significant health concern in China. Differences in stroke risk between rural and urban areas have been highlighted in prior research. However, there is a scarcity of studies on urban-rural differences in predicting stroke. This study aimed to develop stroke prediction models, and urban-rural subgroup analyses were conducted to explore disparities in determinants among middle-aged and older adults. We employed nine machine learning algorithms, namely logistic regression (LR), adaptive boosting classifier, support vector machines, extreme gradient boosting, random forest, Gaussian naive Bayes (GNB), gradient boosting machine, light gradient boosting decision machine, and K Nearest Neighbours, using data derived from 9,413 individuals aged 45 years and above obtained from the China Health and Retirement Longitudinal Study (CHARLS) conducted in 2011 to build stroke prediction models and analyze urban-rural subgroups. In the total population, GNB (AUC = 0.76) was the best model for predicting strokes, and the ten most important variables were the time taken for repeated chair stands, the chair height from floor to seat, knee height, creatinine, complete repeated chair stands, mean corpuscular volume, platelet, uric acid, body mass index, and white blood cell. In the rural subgroup, LR and GNB (AUC = 0.76) were the best, and the ten most important variables were the time taken for repeated chair stands, creatinine, platelet, the chair height from floor to seat, knee height, complete repeated chair stands, pulse, white blood cell, maintaining semi - tandem balance statically, and uric acid. In the urban subgroup, LR (AUC = 0.67) was the best, and the ten most important variables were the time taken for repeated chair stands, mean corpuscular volume, maintaining semi - tandem balance statically, uric acid, right-hand grip strength, age, blood urea nitrogen, use of trunk, arms, legs for semi - tandem balance, number of marriages, and night sleep duration. The time taken for repeated chair stands was more critical in the stroke risk model for rural individuals. Uric acid and maintaining semi - tandem balance statically were more critical in the stroke risk model for urban individuals. Our results revealed the importance of knee height and physical function predictors for stroke and highlighted the differences in determinants between urban and rural individuals, proposing targeted stroke prevention and control strategies in different populations in terms of physical function.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2045-2322
Relation: https://doaj.org/toc/2045-2322
DOI: 10.1038/s41598-025-91157-y
Access URL: https://doaj.org/article/efc40e9819ab453e86d240501a5f4458
Accession Number: edsdoj.fc40e9819ab453e86d240501a5f4458
Database: Directory of Open Access Journals
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
    Url: https://content.ebscohost.com/cds/retrieve?content=AQICAHjPtM4BHU3ZchRwgzYmadcigk49r9CVlbU7V5F6lgH7WwG3oOM3ZoVOrSzIkEHEA7noAAAA4jCB3wYJKoZIhvcNAQcGoIHRMIHOAgEAMIHIBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDFUbp5UDMdXeq2FhTgIBEICBmi0TqcUU48Qq6V5g7jvg7ZEH28qZHL21LC1uVAqhRSwJ1MrKxqdIi0AG2yAxRs8U0Naj30MQA_e7FdByxQvdww-TMCJxxz5EG9fc-bPVHR9nQzLWQbE9UFX3JKNApbUHT-ts2yhOEkXP_rluXpNZyDUJl2ygbbo_y_nAa73pGKhRgUbWNv6ZY8NKppN7KRO_mCXOvOWEaWXV3ds=
Text:
  Availability: 1
  Value: <anid>AN0183282862;[fkqs]25feb.25;2025Feb28.03:47;v2.2.500</anid> <title id="AN0183282862-1">Urban and rural disparities in stroke prediction using machine learning among Chinese older adults </title> <p>Stroke is a significant health concern in China. Differences in stroke risk between rural and urban areas have been highlighted in prior research. However, there is a scarcity of studies on urban-rural differences in predicting stroke. This study aimed to develop stroke prediction models, and urban-rural subgroup analyses were conducted to explore disparities in determinants among middle-aged and older adults. We employed nine machine learning algorithms, namely logistic regression (LR), adaptive boosting classifier, support vector machines, extreme gradient boosting, random forest, Gaussian naive Bayes (GNB), gradient boosting machine, light gradient boosting decision machine, and K Nearest Neighbours, using data derived from 9,413 individuals aged 45 years and above obtained from the China Health and Retirement Longitudinal Study (CHARLS) conducted in 2011 to build stroke prediction models and analyze urban-rural subgroups. In the total population, GNB (AUC = 0.76) was the best model for predicting strokes, and the ten most important variables were the time taken for repeated chair stands, the chair height from floor to seat, knee height, creatinine, complete repeated chair stands, mean corpuscular volume, platelet, uric acid, body mass index, and white blood cell. In the rural subgroup, LR and GNB (AUC = 0.76) were the best, and the ten most important variables were the time taken for repeated chair stands, creatinine, platelet, the chair height from floor to seat, knee height, complete repeated chair stands, pulse, white blood cell, maintaining semi - tandem balance statically, and uric acid. In the urban subgroup, LR (AUC = 0.67) was the best, and the ten most important variables were the time taken for repeated chair stands, mean corpuscular volume, maintaining semi - tandem balance statically, uric acid, right-hand grip strength, age, blood urea nitrogen, use of trunk, arms, legs for semi - tandem balance, number of marriages, and night sleep duration. The time taken for repeated chair stands was more critical in the stroke risk model for rural individuals. Uric acid and maintaining semi - tandem balance statically were more critical in the stroke risk model for urban individuals. Our results revealed the importance of knee height and physical function predictors for stroke and highlighted the differences in determinants between urban and rural individuals, proposing targeted stroke prevention and control strategies in different populations in terms of physical function.</p> <p>Keywords: Stroke; Prediction; Machine learning; Urban and rural disparities; Middle-aged and elderly adults</p> <p>Supplementary Information The online version contains supplementary material available at https://doi.org/10.1038/s41598-025-91157-y.</p> <hd id="AN0183282862-2">Introduction</hd> <p>Stroke is a significant health issue in both China and around the world. In 2020, the weighted prevalence and mortality rates of stroke in China were 2.6% and 343.4 per 100,000 person-years, with an estimated 17.8 million prevalent cases and 2.3 million deaths among the middle-aged and elderly population[<reflink idref="bib1" id="ref1">1</reflink>]. In recent years, stroke has become the leading cause of death in China, accounting for nearly one-third of global stroke deaths[<reflink idref="bib2" id="ref2">2</reflink>]. It is estimated that by 2035, the elderly population aged 60 and above will make up more than 30% of the total population in China, signalling the nation's entry into the era of heavy ageing[<reflink idref="bib3" id="ref3">3</reflink>]. Consequently, there will be an increase in both the incidence and mortality of stroke[<reflink idref="bib4" id="ref4">4</reflink>], posing challenges for stroke prevention and control. Therefore, it is imperative to take active and vigorous action to mitigate the substantial burden of this disease.</p> <p>While stroke prevention and control plans and goals are already established in China, attaining these objectives requires targeted interventions and a thorough comprehension of early screening. However, early screening faces many barriers to implementation. From the patients' perspective, unrecognised or ignored stroke symptoms, unconsciousness, hearing problems, and inadequate knowledge hinder accurate stroke assessment[<reflink idref="bib5" id="ref5">5</reflink>],[<reflink idref="bib6" id="ref6">6</reflink>]. Physicians encounter challenges such as misdiagnosis, inadequate equipment, and personnel shortages[<reflink idref="bib7" id="ref7">7</reflink>]. Additionally, language barriers, diverse stroke symptoms, and confusion caused by alcohol or drugs further complicate early stroke screening[<reflink idref="bib8" id="ref8">8</reflink>],[<reflink idref="bib9" id="ref9">9</reflink>]. A study indicates notable disparities in prehospital delays between urban and rural areas, which are associated with longer distances from remote areas to stroke wards, reflecting differences in barriers to screening and diagnosis between urban and rural areas. Moreover, urban-rural disparities affect follow-up treatment access due to factors such as age and socioeconomic status[<reflink idref="bib10" id="ref10">10</reflink>]. Therefore, applying predictive models is particularly important to address these obstacles and reduce the impact of urban-rural disparities.</p> <p>By analysing a large amount of clinical data, predictive models can more accurately predict stroke risk and identify high-risk patients. The rising popularity of machine learning models in stroke prediction is attributed to their capability to tackle complex nonlinear relationships, interactions, and multicollinearity issues that traditional logistic regression cannot[<reflink idref="bib11" id="ref11">11</reflink>]. Machine learning algorithms do not require statistical inference or assumptions; they are also self-optimizing and adaptive, making them more accurate and flexible tool for stroke risk prediction. Therefore, their efficiency in stroke prediction is higher[<reflink idref="bib12" id="ref12">12</reflink>], allowing physicians and public health workers to create more accurate prevention and control plans earlier[<reflink idref="bib13" id="ref13">13</reflink>]. With the continuous development of machine learning technologies, we have more ways and means to construct and optimise predictive models, offering new possibilities to promote early prediction for stroke to reduce the disease burden.</p> <p>Previous studies have attempted to apply machine learning algorithms to predict stroke risk. A bibliometric analysis showed that most studies have focused on using machine learning to improve stroke risk prediction, diagnosis, and outcome prediction[<reflink idref="bib14" id="ref14">14</reflink>]. In studies of stroke risk prediction among the general population, some studies focused on lab variables like blood biomarkers, urine biomarkers and genetic variables[<reflink idref="bib15" id="ref15">15</reflink>],[<reflink idref="bib16" id="ref16">16</reflink>]. Some studies include sociodemographic characteristics, lifestyle factors, diseases, physical examination measurements and blood biomarkers[<reflink idref="bib17" id="ref17">17</reflink>], [<reflink idref="bib18" id="ref18">18</reflink>], [<reflink idref="bib19" id="ref19">19</reflink>], [<reflink idref="bib20" id="ref20">20</reflink>], [<reflink idref="bib21" id="ref21">21</reflink>]–[<reflink idref="bib22" id="ref22">22</reflink>]. One of the studies lacked physical examination measurements and blood biomarkers[<reflink idref="bib17" id="ref23">17</reflink>], and one lacked lifestyle factors[<reflink idref="bib18" id="ref24">18</reflink>]. Most studies include a limited number of variables per component, and most physical examination measurements primarily involve body measurements such as BMI, blood pressure, hip circumference, and waist circumference. Only Chang et al. investigated grip strength[<reflink idref="bib22" id="ref25">22</reflink>], which may have a significant impact on the risk of cardiovascular disease[<reflink idref="bib23" id="ref26">23</reflink>]. A study showed that ideal cardiovascular health is related to better physical function[<reflink idref="bib24" id="ref27">24</reflink>]. Notably, there is a lack of research exploring the connection between stroke and other physical functions. In the meantime, stroke risk varies between urban and rural areas. The prevalence of stroke was observed to be marginally higher in urban settings (2.7%) compared to rural areas (2.5%). However, both the incidence rate (485.5 vs. 520.8 per 100,000 person-years) and mortality rate (309.9 vs. 369.7 per 100,000 person-years) were significantly lower in urban locales[<reflink idref="bib1" id="ref28">1</reflink>]. Moreover, no studies have conducted subgroup analyses to compare the differences in urban and rural prediction.</p> <p>Given that current studies have rarely considered the role of physical examination measurements, such as physical function, in stroke and have not examined urban-rural differences in stroke prediction, we used self-reported data, physical examination measurements containing physical function variables, and blood biomarkers to create stroke prediction models. We also conducted urban-rural subgroup analyses to discover urban and rural differences in determinants among middle-aged and older adults.</p> <hd id="AN0183282862-3">Results</hd> <p></p> <hd id="AN0183282862-4">Characteristics of the study data</hd> <p>Our study included a total of 9,413 participants aged 45 years and above, with 5,033 females and 4,374 males, 7,861 from rural areas and 1,506 from urban areas. In the total population, the median age was 59 years, with an interquartile range (IQR) of 52-66 years. The rural subgroup had a median age of 58 years (IQR 52-65 years), whereas the urban subgroup had a median age of 60 years (IQR 54-67 years). The occurrence of stroke was 2.1% in the total population, 1.9% in rural subgroup, and 3.4% in urban subgroup. Detailed sociodemographic data and other related information can be seen in Table 1 and Supplementary Table 1.</p> <p>Table 1 Characteristics of study participants.</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left"><p>Characteristic</p></th><th align="left"><p>Total population</p><p><italic>N</italic> = 9,413</p></th><th align="left"><p>Rural subgroup</p><p><italic>N</italic> = 7,861</p></th><th align="left"><p>Urban subgroup</p><p><italic>N</italic> = 1,506</p></th></tr></thead><tbody><tr><td align="left" colspan="4"><p>Gender</p></td></tr><tr><td align="left"><p> Female</p></td><td align="left"><p>5,033 (53.5%)</p></td><td align="left"><p>4,287 (54.5%)</p></td><td align="left"><p>723 (48.0%)</p></td></tr><tr><td align="left"><p> Male</p></td><td align="left"><p>4,374 (46.5%)</p></td><td align="left"><p>3,568 (45.4%)</p></td><td align="left"><p>783 (52.0%)</p></td></tr><tr><td align="left"><p> Missing</p></td><td align="left"><p>6 (0.1%)</p></td><td align="left"><p>6 (0.1%)</p></td><td align="left"><p>0 (0.0%)</p></td></tr><tr><td align="left"><p>Age</p></td><td align="left"><p>59 (52,66)</p></td><td align="left"><p>58 (52,65)</p></td><td align="left"><p>60 (54,67)</p></td></tr><tr><td align="left" colspan="4"><p>Education</p></td></tr><tr><td align="left"><p> Elementary school or lower</p></td><td align="left"><p>6,671 (70.9%)</p></td><td align="left"><p>6,024 (76.6%)</p></td><td align="left"><p>623 (41.4%)</p></td></tr><tr><td align="left"><p> Middle school</p></td><td align="left"><p>1,841 (19.6%)</p></td><td align="left"><p>1,394 (17.7%)</p></td><td align="left"><p>438 (29.1%)</p></td></tr><tr><td align="left"><p> High school or vocational school</p></td><td align="left"><p>779 (8.3%)</p></td><td align="left"><p>425 (5.4%)</p></td><td align="left"><p>341 (22.6%)</p></td></tr><tr><td align="left"><p> College or higher</p></td><td align="left"><p>120 (1.3%)</p></td><td align="left"><p>16 (0.2%)</p></td><td align="left"><p>104 (6.9%)</p></td></tr><tr><td align="left"><p> Missing</p></td><td align="left"><p>2 (0.0%)</p></td><td align="left"><p>2 (0.0%)</p></td><td align="left"><p>0 (0.0%)</p></td></tr><tr><td align="left" colspan="4"><p>Marital status</p></td></tr><tr><td align="left"><p> Married</p></td><td align="left"><p>8,252 (87.7%)</p></td><td align="left"><p>6,879 (87.5%)</p></td><td align="left"><p>1,328 (88.2%)</p></td></tr><tr><td align="left"><p> Not married, separated, divorced or widowed</p></td><td align="left"><p>1,161 (12.3%)</p></td><td align="left"><p>982 (12.5%)</p></td><td align="left"><p>178 (11.8%)</p></td></tr><tr><td align="left" colspan="4"><p>Number of marriages</p></td></tr><tr><td align="left"><p> 0 time</p></td><td align="left"><p>6 (0.1%)</p></td><td align="left"><p>5 (0.1%)</p></td><td align="left"><p>1 (0.1%)</p></td></tr><tr><td align="left"><p> 1 time</p></td><td align="left"><p>8,954 (95.1%)</p></td><td align="left"><p>7,496 (95.4%)</p></td><td align="left"><p>1,415 (94.0%)</p></td></tr><tr><td align="left"><p> 2–6 times</p></td><td align="left"><p>384 (4.1%)</p></td><td align="left"><p>298 (3.8%)</p></td><td align="left"><p>83 (5.5%)</p></td></tr><tr><td align="left"><p> Missing</p></td><td align="left"><p>69 (0.7%)</p></td><td align="left"><p>62 (0.8%)</p></td><td align="left"><p>7 (0.5%)</p></td></tr><tr><td align="left" colspan="4"><p>Standard of living</p></td></tr><tr><td align="left"><p> Poor</p></td><td align="left"><p>1,185 (12.6%)</p></td><td align="left"><p>1,070 (13.6%)</p></td><td align="left"><p>111 (7.4%)</p></td></tr><tr><td align="left"><p> Relatively poor</p></td><td align="left"><p>2,988 (31.7%)</p></td><td align="left"><p>2,445 (31.1%)</p></td><td align="left"><p>529 (35.1%)</p></td></tr><tr><td align="left"><p> Average</p></td><td align="left"><p>4,848 (51.5%)</p></td><td align="left"><p>4,022 (51.2%)</p></td><td align="left"><p>799 (53.1%)</p></td></tr><tr><td align="left"><p> Relatively high</p></td><td align="left"><p>233 (2.5%)</p></td><td align="left"><p>188 (2.4%)</p></td><td align="left"><p>45 (3.0%)</p></td></tr><tr><td align="left"><p> Very high</p></td><td align="left"><p>19 (0.2%)</p></td><td align="left"><p>17 (0.2%)</p></td><td align="left"><p>2 (0.1%)</p></td></tr><tr><td align="left"><p> Missing</p></td><td align="left"><p>140 (1.5%)</p></td><td align="left"><p>119 (1.5%)</p></td><td align="left"><p>20 (1.3%)</p></td></tr><tr><td align="left" colspan="4"><p>Stroke</p></td></tr><tr><td align="left"><p> No</p></td><td align="left"><p>9,212 (97.9%)</p></td><td align="left"><p>7,711 (98.1%)</p></td><td align="left"><p>1,455 (96.6%)</p></td></tr><tr><td align="left"><p> Yes</p></td><td align="left"><p>201 (2.1%)</p></td><td align="left"><p>150 (1.9%)</p></td><td align="left"><p>51 (3.4%)</p></td></tr></tbody></table> </ephtml> </p> <p>Note: Categorical variables are presented as number of participants (%); numeric variables are presented as the median (25%,75%).</p> <hd id="AN0183282862-5">Prediction of stroke</hd> <p>The receiver operating characteristic curves of the nine machine learning algorithms applied to the predictive models were depicted in Fig. 1. GNB demonstrated the highest predictive accuracy (AUC = 0.76), followed by LR (AUC = 0.75), LGBM (AUC = 0.71), XGBoost (AUC = 0.71), RF (AUC = 0.70), GBM (AUC = 0.68), AdaBoost (AUC = 0.64), KNN (AUC = 0.61), and SVM (AUC = 0.51). In subgroup analysis, LR and GNB achieved the highest AUC of 0.76 in the rural subgroup, followed by RF(AUC = 0.67), AdaBoost (AUC = 0.64), XGBoost (AUC = 0.64), GBM (AUC = 0.60), KNN (AUC = 0.60), LGBM (AUC = 0.59) and SVM (AUC = 0.54). LR achieved the highest AUC of 0.67 in the urban subgroup, followed by LGBM (AUC = 0.65), KNN (AUC = 0.63), SVM (AUC = 0.62), GNB (AUC = 0.61), XGBoost (AUC = 0.59), RF (AUC = 0.55), AdaBoost (AUC = 0.54), and GBM (AUC = 0.47).</p> <p>Graph: Fig. 1 Receiver operating characteristic curve performance of stroke risk prediction in (a) total population, (b) rural subgroup, (c) urban subgroup. AUC area under the curve, LR logistic regression, AdaBoost adaptive boosting classifier, SVM support vector machines, XGBoost extreme gradient boosting, RF random forest, GNB Gaussian naive Bayes, GBM gradient boosting machine, LGBM light gradient boosting decision machine, KNN K Nearest Neighbours.</p> <hd id="AN0183282862-6">Important predictors of stroke</hd> <p>The time taken for repeated chair stands, the chair height from floor to seat, knee height, CRE, complete repeated chair stands, MCV, PLT, UA, BMI, and WBC were the ten most important variables in predicting stroke in the total population (Fig. 2a). In the analysis of the rural subgroup, the time taken for repeated chair stands, CRE, PLT, the chair height from floor to seat, knee height, complete repeated chair stands, pulse, WBC, maintaining semi - tandem balance statically, UA were the ten most important variables in predicting stroke (Fig. 2b). In the analysis of the urban subgroup, the time taken for repeated chair stands, MCV, maintaining semi - tandem balance statically, UA, right-hand grip strength, age, BUN, use of trunk, arms, legs for semi - tandem balance, number of marriages, night sleep duration were the ten most important variables in predicting stroke (Fig. 2c). Our study also emphasized the differences in urban and rural subgroups (Fig. 2d). The time taken for repeated chair stands was more critical in the stroke risk model for rural individuals. UA and maintaining semi - tandem balance statically were more critical in the stroke risk model for urban individuals. Other variables that did not appear simultaneously in the top ten rankings of the urban-rural subgroup were not compared directly.</p> <p>Graph: Fig. 2 The top 10 predictors in the prediction of stroke by the best model are (a) total population, (b) rural subgroup, (c) urban subgroup, and (d) comparison between groups. BMI body mass index, UA uric acid, PLT platelet, MCV mean corpuscular volume, CRE creatinine, BUN blood urea nitrogen, WBC white blood cell.</p> <hd id="AN0183282862-7">Discussion</hd> <p>To our knowledge, this study is the first to utilize machine learning algorithms to develop stroke prediction models among the Chinese urban and rural populations in China. Our findings revealed that machine learning algorithms, which were based on comprehensive data collected from self-reported questionnaires, physical examinations and clinical measurements, performed at an acceptable level to accurately predict stroke in individuals over 45 years old. Our results showed that traditional LR demonstrated superior predictive performance across diverse populations. Our study also demonstrated the importance of physical function predictors collected from physical examination, such as balance abilities and hand grip strength, and the potential predictive value of knee height for stroke, providing new possibilities for prevention and control measures.</p> <p>Our results showed that machine learning predictive models in the rural subgroup performed better than those in the urban subgroup, reaching acceptable levels. This implies that the machine learning algorithms have higher accuracy in predicting the risk of stroke in the rural subgroup. The variations in lifestyle, health consciousness, and healthcare accessibility between the two populations could account for these results. In China, rural populations aged 45 years and above tend to exhibit a higher frequency of established stroke risk factors compared to urban individuals, such as smoking and excessive alcohol consumption. Additionally, rural areas face challenges in chronic disease management[<reflink idref="bib25" id="ref29">25</reflink>]. However, there are no machine learning predictive models of stroke for rural and urban areas. In contrast to other studies, we considered such urban-rural disparities in our study design and analysis, which allowed our models to circumvent obstacles in stroke recognition effectively and facilitated the formulation of more precise and personalized preventive measures tailored to distinct regional populations.</p> <p>Our findings highlighted the significance of physical function measurements in physical examination, which are related to ideal cardiovascular health[<reflink idref="bib24" id="ref30">24</reflink>]. According to the guidelines for the prevention and treatment of stroke in China (2021 Edition)[<reflink idref="bib26" id="ref31">26</reflink>], risk factors for stroke are categorized as intervenable and non-intervenable. Non-interventional factors mainly include age, gender, race, and genetic factors. Intervenable factors include hypertension, glucose metabolism disorders, dyslipidemia, heart disease, asymptomatic carotid atherosclerosis, and lifestyle. Our findings further emphasized the significance of balance abilities and hand grip strength. Concerning balance abilities, we used three tests to analyse the predictive value of balance as a risk factor for stroke: semi-tandem balance, full-tandem balance and repeated chair stands. The last requires lower limb strength and effective balance control[<reflink idref="bib27" id="ref32">27</reflink>]. In a study of stroke risk in people with disabilities, those with balance disorders had the highest risk of stroke[<reflink idref="bib28" id="ref33">28</reflink>], further supporting the importance of balance in predicting stroke risk. The central nervous system may have an impact on stroke risk, as demonstrated by the ability to maintain balance during daily exercise[<reflink idref="bib28" id="ref34">28</reflink>]. Therefore, by assessing and improving balance, we may be able to better predict and reduce the risk of stroke. Most of the studies on balance have been limited to the effect of both on stroke prognosis, and further studies are needed to elucidate their relationship with stroke onset and their underlying mechanisms. What's more, muscle mass has a significant impact on the risk of cardiovascular disease. Research indicates a strong correlation between grip strength and the incidence of coronary heart disease and intracerebral infarction. Furthermore, muscle strength in early adulthood predicts the later risk of heart disease and stroke[<reflink idref="bib23" id="ref35">23</reflink>]. These results open new avenues for targeted interventions. Additionally, we identified knee height as a new predictor. The upbringing and living conditions during childhood may affect the risk of certain chronic diseases in later life, and limb length may be an important indicator to observe[<reflink idref="bib29" id="ref36">29</reflink>]. Two studies showed that knee height was correlated with diabetes[<reflink idref="bib29" id="ref37">29</reflink>],[<reflink idref="bib30" id="ref38">30</reflink>], and there have been no studies on the relationship between knee height and stroke. The findings complemented those of earlier studies and drew our attention to the potential impact of childhood experiences on stroke.</p> <p>Furthermore, our study emphasized urban-rural differences in key predictors of stroke. For physical function variables, repeated chair stands test was more critical in the stroke risk model for rural individuals, while balance measurements of semi-tandem were more important in the stroke risk model for urban individuals. We also noted the importance of right-hand grip strength in urban populations. The older adults living in rural areas engage in more physical activities than those living in urban areas[<reflink idref="bib31" id="ref39">31</reflink>]. Rural individuals have improved their physical fitness due to long periods of agricultural labour[<reflink idref="bib32" id="ref40">32</reflink>], while urban individuals benefit from the convenience of transportation and amenities, which save them from many chores that would otherwise require physical strength. This leads to differences in physical function in elderly adults in the two areas. For some other variables that did not appear in the top ten variable importance in both subgroups, the reasons for the difference in variable importance rankings may be as follows. The differences in upbringing and living conditions between urban and rural areas may affect knee height[<reflink idref="bib29" id="ref41">29</reflink>], potentially influencing stroke risk. Societal attitudes contribute to variations in the number of marriages, and disparities in lifestyle and stress levels impact the night sleep duration among individuals in these areas. These findings underscored the need for tailored stroke prevention and intervention strategies based on the specific risk profiles of urban and rural individuals. For rural individuals, promoting physical activities and exercises specifically targeting lower body strength and endurance is important, given the significance of factors related to repeated chair stands. On the other hand, for urban individuals, implementing balance and upper body strength exercises can help improve semi-tandem balance and hand grip strength due to the importance of neuromuscular coordination and muscle strength. Secondly, promoting healthy sleep habits is crucial, as it may be linked to overall health and the risk of stroke.</p> <hd id="AN0183282862-8">Limitations</hd> <p>This study has some limitations. First, the study utilised cross-sectional data, which can only estimate the present likelihood of illness, not the future probability of illness. It can also only show the association between the predictors and outcome but not the cause-effect relationship. However, we can argue that the strong association between the identified predictors and outcome could be useful as an indicator in the predictive model. Secondly, although this study combined data from the self-reported questionnaires, physical examinations, and clinical measurements, it did not include all possible factors. Other factors that were also associated with stroke, such as urine markers, could not be included in the model because they were not collected in the CHARLS. In addition, some predictors were self-reported, which can be influenced by participants' recall bias. Thirdly, our study noted new predictive variables, such as upper arm length, but the mechanism of action between them and stroke lacks in-depth confirmation, and future studies should further dissect this relationship. Finally, as the relationships that are valid in the Chinese population may not apply to other populations, the results of our study will require external validation in other populations to guarantee the practicality of the identified predictors.</p> <hd id="AN0183282862-9">Methods</hd> <p></p> <hd id="AN0183282862-10">Study design and participants</hd> <p>Our research utilised data from the China Health and Retirement Longitudinal Study (CHARLS), a cohort study with national representation from 2011 to 2018. The CHARLS survey used a multistage sampling strategy that covered 28 provinces, 150 counties or districts, and 450 villages or urban communities in China[<reflink idref="bib33" id="ref42">33</reflink>]. The study was approved by the Biomedical Ethics Review Committee of Peking University (approval number IRB00001052-11015), and all participants provided written informed consent before being included in the study. All methods were carried out in accordance with relevant guidelines and regulations. Our study used baseline data from 17,705 participants in 2011. We included 9,413 individuals for the study after excluding those who were (<reflink idref="bib1" id="ref43">1</reflink>) missing stroke data, (<reflink idref="bib2" id="ref44">2</reflink>) missing age data or aged < 45 years, and (<reflink idref="bib3" id="ref45">3</reflink>) no physical examination measurements or blood biomarker information.</p> <p>In addition, participants' family background (urban or rural) was determined by their Hukou status. Hukou is a Chinese household registration system that classifies citizens as rural or urban individuals based on their parents' Hukou registration. It is linked to implementing many social programs and is essential for accessing government resources[<reflink idref="bib34" id="ref46">34</reflink>],[<reflink idref="bib35" id="ref47">35</reflink>]. In the subgroup analyses, we assigned those with agricultural Hukou to the rural subgroup (<emph>n</emph> = 7,861) and those with non-agricultural Hukou to the urban subgroup (<emph>n</emph> = 1,506) based on the question of "household registration type". The remaining individuals who (<reflink idref="bib1" id="ref48">1</reflink>) did not have Hukou, (<reflink idref="bib2" id="ref49">2</reflink>) had unified residence Hukou, and (<reflink idref="bib3" id="ref50">3</reflink>) were missing "household registration type" data were not included in the subgroup analyses. More details on data inclusion and exclusion were shown in Fig. 3.</p> <p>Graph: Fig. 3 Participant encounter inclusion and exclusion diagram.</p> <hd id="AN0183282862-11">Predictors and outcome</hd> <p>Our study selected a wide range of predictors, including self-reported data, physical examination measurements, and blood biomarkers (see Table 2). The complete set of predictors of stroke is shown in Supplementary Table 2. The outcome was stroke, which was ascertained through participants' responses to the self-reported question, "Have you been diagnosed with a stroke by a doctor?" Subgroup analyses were conducted based on the criteria, separately for urban and rural subpopulations. Ultimately, we developed stroke prediction models for the total population and subgroups using three categories of predictors.</p> <p>Table 2 Selected predictors of stroke.</p> <p> <ephtml> <table frame="hsides" rules="groups"><thead><tr><th align="left"><p>Categories</p></th><th align="left"><p>Variable types</p></th><th align="left"><p>Variables description</p></th></tr></thead><tbody><tr><td align="left" colspan="2"><p>Self-reported data</p></td><td align="left"><p>Gender, age, night sleep duration, hypertension, dyslipidemia, diabetes, struggling with body pains</p></td></tr><tr><td align="left" rowspan="2"><p>Physical examination measurements</p></td><td align="left"><p>Body measurements</p></td><td align="left"><p>Upper arm length, knee height, waist circumference, body mass index (BMI)</p></td></tr><tr><td align="left"><p>Physical function measurements</p></td><td align="left"><p>(1) Balance measurements of semi-tandem <sup>a</sup> maintain semi-tandem balance statically, use trunk, arms, legs for semi-tandem balance</p><p>(2) Balance measurements of full-tandem <sup>b</sup> maintain full-tandem balance statically, use trunk, arms, legs for full-tandem balance</p><p>(3) Repeated chair stands test <sup>c</sup> complete repeated chair stands, the time taken for repeated chair stands, the chair height from floor to seat, use trunk arms during repeated chair stands</p><p>(4) Muscle strength: left-hand grip strength, right-hand grip strength</p></td></tr><tr><td align="left" colspan="2"><p>Blood biomarkers</p></td><td align="left"><p>mean corpuscular volume (MCV), platelet (PLT), blood urea nitrogen (BUN)</p></td></tr></tbody></table> </ephtml> </p> <p> <sups>a</sups>Balance measurements of semi-tandem: stand with the side of the heel of one foot touching the big toe of the other foot for about 10 seconds. <sups>b</sups>Balance measurements of full-tandem: stand with the heel of one foot in front of and touch the toes of the other foot for about [30/60] seconds. 30 seconds for 70 years old or above; 60 seconds for less than 70 years old. <sups>c</sups>Repeated chair stands test: stand up straight and then sit down again at participants' fastest pace five times without stopping in between and without using arms to push off.</p> <hd id="AN0183282862-12">Analysis</hd> <p></p> <hd id="AN0183282862-13">Data processing</hd> <p>We used R 4.1.3 for data processing. The variables with less than 30% missing values were imputed through the mice package with the random forest method. The dataset was partitioned into training and testing sets using an 80%-20% split. To ensure consistent random results, random seeds were utilized during the data splitting process, and the data order was shuffled to minimize sample correlation. The dataset presented a notable disproportion in the number of samples between negative and positive stroke outcomes, leading to a potential issue of data imbalance. Such imbalance could bias the model towards predicting the majority class, possibly causing overfitting or compromising predictive accuracy. We resampled the imbalanced dataset to address this issue. Specifically, we applied oversampling techniques to the training dataset, aiming to achieve a more equitable representation of both negative and positive outcomes.</p> <hd id="AN0183282862-14">Machine learning algorithms</hd> <p>We utilised nine common machine learning algorithms, namely logistic regression (LR), adaptive boosting classifier (AdaBoost), support vector machines (SVM), extreme gradient boosting (XGBoost), random forest (RF), gaussian naive Bayes (GNB), gradient boosting machine (GBM), light gradient boosting decision machine (LGBM), and K Nearest Neighbours (KNN) to construct risk prediction models for stroke. Machine learning algorithms were conducted with Python 3.8.12. LR, AdaBoost, SVM, RF, GNB, GBM, LGBM and KNN were built using the scikit-learn library in Python. XGBoost was built using the XGBoost library in Python. To identify the best hyperparameters for machine learning algorithms, we employed a five-fold cross-validation approach along with Bayesian optimization.</p> <hd id="AN0183282862-15">Performance measurement</hd> <p>The performance of the model was evaluated using the area under the curve (AUC) metrics, which represent the area under the receiver operating characteristic (ROC) curve. A higher AUC value indicates a better prediction effect of model[<reflink idref="bib36" id="ref51">36</reflink>]. An AUC value ranging from 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is regarded as excellent, and any value exceeding 0.9 is categorized as outstanding[<reflink idref="bib11" id="ref52">11</reflink>].</p> <hd id="AN0183282862-16">Statistical descriptive analysis</hd> <p>The median (M) and interquartile range (IQR) were used for numerical data to characterize the distribution. For categorical data, it was described using frequency and percentage. All statistical analyses were performed in R 4.1.3.</p> <hd id="AN0183282862-17">Acknowledgements</hd> <p>This work was supported by Artificial Intelligence-Driven Reform of Scientific Research Paradigms to Empower Disciplinary Advancement Program, Shanghai University of Traditional Chinese Medicine in 2021 (2021LK008), and Shanghai University of Traditional Chinese Medicine in 2024 (KECJ2024019). The funders were not involved in the study design, data collection, analysis, decision to publish, or manuscript preparation. The authors extend their thanks to the CHARLS team and all participants for their valuable time and effort in enabling this study.</p> <hd id="AN0183282862-18">Author contributions</hd> <p>ZJ and XX conceived and designed the study. XX and ZJ established the models and coding. ZJ, XX contributed to data cleaning. ZJ wrote the first draft and edited the manuscript. LL, SL, SH and ZH contributed to the manuscript revision. All authors contributed to the preparation of the manuscript and approved the final manuscript.</p> <hd id="AN0183282862-19">Data availability</hd> <p>The datasets generated during and/or analysed during the current study are available in the CHARLS repository, https://charls.pku.edu.cn/.</p> <hd id="AN0183282862-20">Declarations</hd> <p></p> <hd id="AN0183282862-21">Competing interests</hd> <p>The authors declare no competing interests.</p> <hd id="AN0183282862-22">Electronic supplementary material</hd> <p>Below is the link to the electronic supplementary material.</p> <p>Graph: Supplementary Material 1</p> <hd id="AN0183282862-23">Publisher's note</hd> <p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p> <ref id="AN0183282862-24"> <title> References </title> <blist> <bibl id="bib1" idref="ref1" type="bt">1</bibl> <bibtext> Tu WJ. Estimated burden of stroke in China in 2020. JAMA Netw. Open. 2023; 6: e231455. 10.1001/jamanetworkopen.2023.1455. 36862407. 9982699</bibtext> </blist> <blist> <bibl id="bib2" idref="ref2" type="bt">2</bibl> <bibtext> Wang W. Prevalence, incidence, and mortality of stroke in China: results from a nationwide Population-Based survey of 480 687 adults. Circulation. 2017; 135: 759-771. 10.1161/CIRCULATIONAHA.116.025250. 28052979. 1376.05057</bibtext> </blist> <blist> <bibl id="bib3" idref="ref3" type="bt">3</bibl> <bibtext> Division, A. Transcript of the September 20, 2022 press conference by the NHSRC. <ulink href="http://www.nhc.gov.cn/xcs/s3574/202209/ee4dc20368b440a49d270a228f5b0ac1.shtml">http://www.nhc.gov.cn/xcs/s3574/202209/ee4dc20368b440a49d270a228f5b0ac1.shtml</ulink> (2022).</bibtext> </blist> <blist> <bibl id="bib4" idref="ref4" type="bt">4</bibl> <bibtext> Wu S. Stroke in China: advances and challenges in epidemiology, prevention, and management. Lancet Neurol. 2019; 18: 394-405. 10.1016/S1474-4422(18)30500-3. 30878104. 1128.30307</bibtext> </blist> <blist> <bibl id="bib5" idref="ref5" type="bt">5</bibl> <bibtext> Bakke I, Lund CG, Carlsson M, Salvesen R, Normann B. Barriers to and facilitators for making emergency calls - a qualitative interview study of stroke patients and witnesses. J. Stroke Cerebrovasc. Dis. 2022; 31: 106734. 10.1016/j.jstrokecerebrovasdis.2022.106734. 36037678</bibtext> </blist> <blist> <bibl id="bib6" idref="ref6" type="bt">6</bibl> <bibtext> Mackay E, Theron E, Stassen W. The barriers and facilitators to the telephonic application of the FAST assessment for stroke in a private emergency dispatch centre in South Africa. Afr. J. Emerg. Med. 2021; 11: 15-19. 10.1016/j.afjem.2020.11.002. 33318913. 0817.73055</bibtext> </blist> <blist> <bibl id="bib7" idref="ref7" type="bt">7</bibl> <bibtext> Meng Z. Development and validation of a LASSO prediction model for better identification of ischemic stroke: A Case-Control study in China. Front. Aging Neurosci. 2021; 13: 630437. 10.3389/fnagi.2021.630437. 34305566. 8296821</bibtext> </blist> <blist> <bibl id="bib8" idref="ref8" type="bt">8</bibl> <bibtext> Zhao J, Liu R. Stroke 1-2-0: a rapid response programme for stroke in China. Lancet Neurol. 2017; 16: 27-28. 10.1016/S1474-4422(16)30283-6. 28029517. 1369.55002</bibtext> </blist> <blist> <bibl id="bib9" idref="ref9" type="bt">9</bibl> <bibtext> Hodell E. Paramedic perspectives on barriers to prehospital acute stroke recognition. Prehosp Emerg. Care. 2016; 20: 415-424. 10.3109/10903127.2015.1115933. 26855299</bibtext> </blist> <blist> <bibtext> Buus, S. et al. Urban-rural inequalities in IV thrombolysis for acute ischemic stroke: A nationwide study. Eur. Stroke J.23969873241244591https://doi.org/10.1177/23969873241244591 (2024).</bibtext> </blist> <blist> <bibtext> Xu, X. et al. Web-Based Risk Prediction Tool for an Individual's Risk of HIV and Sexually Transmitted Infections Using Machine Learning Algorithms: Development and External Validation Study. J. Med. Internet. Res.24. https://doi.org/10.2196/37850 (2022).</bibtext> </blist> <blist> <bibtext> Chahine, Y. et al. Machine learning and the conundrum of stroke risk prediction. Arrhythmia Electrophysiol. Rev.12. https://doi.org/10.15420/aer.2022.34 (2023).</bibtext> </blist> <blist> <bibtext> Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension. Front. Cardiovasc. Med. 2022; 9: 1025705. 10.3389/fcvm.2022.1025705. 36451926. 9701715</bibtext> </blist> <blist> <bibtext> Pahwa B, Tayal A, Garg K. Contributions of machine learning in the management of stroke: A bibliometric analysis of the 50 most cited articles. World Neurosurg. 2024; 184: 152-160. 10.1016/j.wneu.2024.01.059. 38244687</bibtext> </blist> <blist> <bibtext> Alanazi EM, Abdou A, Luo J. Predicting risk of stroke from lab tests using machine learning algorithms: development and evaluation of prediction models. JMIR Form. Res. 2021; 5: e23440. 10.2196/23440. 34860663. 8686476. 1468.62287</bibtext> </blist> <blist> <bibtext> Theofilatos K, Korfiati A, Mavroudi S, Cowperthwaite MC, Shpak M. Discovery of stroke-related blood biomarkers from gene expression network models. BMC Med. Genomics. 2019; 12: 118. 1:CAS:528:DC%2BC1MXhsFCmsbfE. 10.1186/s12920-019-0566-8. 31391037. 6686563</bibtext> </blist> <blist> <bibtext> Qiu Y. Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study. BMJ Open. 2023; 13: e068045. 10.1136/bmjopen-2022-068045. 36858471. 9980356</bibtext> </blist> <blist> <bibtext> Lolak S, Attia J, McKay GJ, Thakkinstian A. Comparing explainable machine learning approaches with traditional statistical methods for evaluating stroke risk models: retrospective cohort study. JMIR Cardio. 2023; 7: e47736. 10.2196/47736. 37494080. 10413234. 1473.62374</bibtext> </blist> <blist> <bibtext> Dritsas, E. & Trigka, M. Stroke risk prediction with machine learning techniques. Sens. (Basel). 22https://doi.org/10.3390/s22134670 (2022).</bibtext> </blist> <blist> <bibtext> Hong C. Predictive accuracy of stroke risk prediction models across black and white race, sex, and age groups. Jama. 2023; 329: 306-317. 10.1001/jama.2022.24683. 36692561. 10408266. 1512.35308</bibtext> </blist> <blist> <bibtext> Chun M. Stroke risk prediction using machine learning: a prospective cohort study of 0.5 million Chinese adults. J. Am. Med. Inform. Assoc. 2021; 28: 1719-1727. 10.1093/jamia/ocab068. 33969418. 8324240. 07872075</bibtext> </blist> <blist> <bibtext> Chang HW. Ischemic stroke prediction using machine learning in elderly Chinese population: the Rugao longitudinal ageing study. Brain Behav. 2023; 13: e3307. 1:CAS:528:DC%2BB3sXitlahtrzL. 10.1002/brb3.3307. 37934082. 10726889</bibtext> </blist> <blist> <bibtext> Silventoinen K, Magnusson PK, Tynelius P, Batty GD, Rasmussen F. Association of body size and muscle strength with incidence of coronary heart disease and cerebrovascular diseases: a population-based cohort study of one million Swedish men. Int. J. Epidemiol. 2009; 38: 110-118. 10.1093/ije/dyn231. 19033357</bibtext> </blist> <blist> <bibtext> Fan D. Cardiovascular health profiles, systemic inflammation, and physical function in older adults: A population-based study. Arch. Gerontol. Geriatr. 2023; 109: 104963. 10.1016/j.archger.2023.104963. 36804699. 1458.90375</bibtext> </blist> <blist> <bibtext> Prevention, T. N. B. o. D. C. a. Report on the Nutrition and Chronic Diseases Status of Chinese Residents 2020: BeijingThe peoples medical publishing house,. (2021).</bibtext> </blist> <blist> <bibtext> China, N. H. C. o. t. P. s. R. o. Guidelines for the prevention and treatment of stroke in China (2021). Edition <ulink href="http://www.nhc.gov.cn/yzygj/s3593/202108/50c4071a86df4bfd9666e9ac2aaac605/files/674273fa2ec049cc97ff89102c472155.pdf">http://www.nhc.gov.cn/yzygj/s3593/202108/50c4071a86df4bfd9666e9ac2aaac605/files/674273fa2ec049cc97ff89102c472155.pdf</ulink> (2021).</bibtext> </blist> <blist> <bibtext> Medina-Mirapeix F, Crisostomo MJ, Martín San Agustín R, Sánchez-Martínez MP. Prognostic value of balance performance for improvements of community ambulation among stroke patients: a cohort study. Eur. J. Phys. Rehabil Med. 2022; 58: 171-178. 10.23736/s1973-9087.21.06996-3. 34498829</bibtext> </blist> <blist> <bibtext> Inchai P, Tsai WC, Chiu LT, Kung PT. Incidence, risk, and associated risk factors of stroke among people with different disability types and severities: A National population-based cohort study in Taiwan. Disabil. Health J. 2021; 14: 101165. 10.1016/j.dhjo.2021.101165. 34266788</bibtext> </blist> <blist> <bibtext> He B. Upper arm length and knee height are associated with diabetes in the middle-aged and elderly: evidence from the China health and retirement longitudinal study. Public. Health Nutr. 2023; 26: 190-198. 10.1017/s1368980022001215. 35581171. 0986.34038</bibtext> </blist> <blist> <bibtext> Palloni A, McEniry M, Wong R, Peláez M. The tide to come: elderly health in Latin America and the Caribbean. J. Aging Health. 2006; 18: 180-206. 10.1177/0898264305285664. 16614340</bibtext> </blist> <blist> <bibtext> Zhu W, Chi A, Sun Y. Physical activity among older Chinese adults living in urban and rural areas: A review. J. Sport Health Sci. 2016; 5: 281-286. 10.1016/j.jshs.2016.07.004. 30356525. 6188614. 1247.90284</bibtext> </blist> <blist> <bibtext> Chen, X, Lin, Z, Gao, R, Yang, Y. & Li, L. Prevalence and associated factors of falls among older adults between urban and rural areas of Shantou City, China. Int. J. Environ. Res. Public. Health. 18. https://doi.org/10.3390/ijerph18137050 (2021).</bibtext> </blist> <blist> <bibtext> Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China health and retirement longitudinal study (CHARLS). Int. J. Epidemiol. 2014; 43: 61-68. 10.1093/ije/dys203. 23243115</bibtext> </blist> <blist> <bibtext> Wei JM, Li S, Claytor L, Partridge J, Goates S. Prevalence and predictors of malnutrition in elderly Chinese adults: results from the China health and retirement longitudinal study. Public. Health Nutr. 2018; 21: 3129-3134. 10.1017/s1368980018002227. 30282567. 6316353. 1517.93066</bibtext> </blist> <blist> <bibtext> Wang G. Determinants of COVID-19 vaccination status and hesitancy among older adults in China. Nat. Med. 2023; 29: 623-631. 1:CAS:528:DC%2BB3sXjvFCmtL4%3D. 10.1038/s41591-023-02241-7. 36720270. 10285745. 1498.35399</bibtext> </blist> <blist> <bibtext> Du Z. Accurate prediction of coronary heart disease for patients with hypertension from electronic health records with big data and Machine-Learning methods: model development and performance evaluation. JMIR Med. Inf. 2020; 8: e17257. 10.2196/17257</bibtext> </blist> </ref> <aug> <p>By Jingjing Zhu; Luotao Lin; Lei Si; Hailei Zhao; Hualing Song and Xianglong Xu</p> <p>Reported by Author; Author; Author; Author; Author; Author</p> </aug> <nolink nlid="nl1" bibid="bib10" firstref="ref10"></nolink> <nolink nlid="nl2" bibid="bib11" firstref="ref11"></nolink> <nolink nlid="nl3" bibid="bib12" firstref="ref12"></nolink> <nolink nlid="nl4" bibid="bib13" firstref="ref13"></nolink> <nolink nlid="nl5" bibid="bib14" firstref="ref14"></nolink> <nolink nlid="nl6" bibid="bib15" firstref="ref15"></nolink> <nolink nlid="nl7" bibid="bib16" firstref="ref16"></nolink> <nolink nlid="nl8" bibid="bib17" firstref="ref17"></nolink> <nolink nlid="nl9" bibid="bib18" firstref="ref18"></nolink> <nolink nlid="nl10" bibid="bib19" firstref="ref19"></nolink> <nolink nlid="nl11" bibid="bib20" firstref="ref20"></nolink> <nolink nlid="nl12" bibid="bib21" firstref="ref21"></nolink> <nolink nlid="nl13" bibid="bib22" firstref="ref22"></nolink> <nolink nlid="nl14" bibid="bib23" firstref="ref26"></nolink> <nolink nlid="nl15" bibid="bib24" firstref="ref27"></nolink> <nolink nlid="nl16" bibid="bib25" firstref="ref29"></nolink> <nolink nlid="nl17" bibid="bib26" firstref="ref31"></nolink> <nolink nlid="nl18" bibid="bib27" firstref="ref32"></nolink> <nolink nlid="nl19" bibid="bib28" firstref="ref33"></nolink> <nolink nlid="nl20" bibid="bib29" firstref="ref36"></nolink> <nolink nlid="nl21" bibid="bib30" firstref="ref38"></nolink> <nolink nlid="nl22" bibid="bib31" firstref="ref39"></nolink> <nolink nlid="nl23" bibid="bib32" firstref="ref40"></nolink> <nolink nlid="nl24" bibid="bib33" firstref="ref42"></nolink> <nolink nlid="nl25" bibid="bib34" firstref="ref46"></nolink> <nolink nlid="nl26" bibid="bib35" firstref="ref47"></nolink> <nolink nlid="nl27" bibid="bib36" firstref="ref51"></nolink>
CustomLinks:
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsdoj&genre=article&issn=20452322&ISBN=&volume=15&issue=1&date=20250201&spage=1&pages=1-9&title=Scientific Reports&atitle=Urban%20and%20rural%20disparities%20in%20stroke%20prediction%20using%20machine%20learning%20among%20Chinese%20older%20adults&aulast=Jingjing%20Zhu&id=DOI:10.1038/s41598-025-91157-y
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
  – Url: https://doaj.org/article/efc40e9819ab453e86d240501a5f4458
    Name: EDS - DOAJ (s8985755)
    Category: fullText
    Text: View record from DOAJ
    MouseOverText: View record from DOAJ
Header DbId: edsdoj
DbLabel: Directory of Open Access Journals
An: edsdoj.fc40e9819ab453e86d240501a5f4458
RelevancyScore: 1082
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1081.56311035156
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Urban and rural disparities in stroke prediction using machine learning among Chinese older adults
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Jingjing+Zhu%22">Jingjing Zhu</searchLink><br /><searchLink fieldCode="AR" term="%22Luotao+Lin%22">Luotao Lin</searchLink><br /><searchLink fieldCode="AR" term="%22Lei+Si%22">Lei Si</searchLink><br /><searchLink fieldCode="AR" term="%22Hailei+Zhao%22">Hailei Zhao</searchLink><br /><searchLink fieldCode="AR" term="%22Hualing+Song%22">Hualing Song</searchLink><br /><searchLink fieldCode="AR" term="%22Xianglong+Xu%22">Xianglong Xu</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Scientific Reports, Vol 15, Iss 1, Pp 1-9 (2025)
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: Nature Portfolio, 2025.
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2025
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: LCC:Medicine<br />LCC:Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Stroke%22">Stroke</searchLink><br /><searchLink fieldCode="DE" term="%22Prediction%22">Prediction</searchLink><br /><searchLink fieldCode="DE" term="%22Machine+learning%22">Machine learning</searchLink><br /><searchLink fieldCode="DE" term="%22Urban+and+rural+disparities%22">Urban and rural disparities</searchLink><br /><searchLink fieldCode="DE" term="%22Middle-aged+and+elderly+adults%22">Middle-aged and elderly adults</searchLink><br /><searchLink fieldCode="DE" term="%22Medicine%22">Medicine</searchLink><br /><searchLink fieldCode="DE" term="%22Science%22">Science</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Abstract Stroke is a significant health concern in China. Differences in stroke risk between rural and urban areas have been highlighted in prior research. However, there is a scarcity of studies on urban-rural differences in predicting stroke. This study aimed to develop stroke prediction models, and urban-rural subgroup analyses were conducted to explore disparities in determinants among middle-aged and older adults. We employed nine machine learning algorithms, namely logistic regression (LR), adaptive boosting classifier, support vector machines, extreme gradient boosting, random forest, Gaussian naive Bayes (GNB), gradient boosting machine, light gradient boosting decision machine, and K Nearest Neighbours, using data derived from 9,413 individuals aged 45 years and above obtained from the China Health and Retirement Longitudinal Study (CHARLS) conducted in 2011 to build stroke prediction models and analyze urban-rural subgroups. In the total population, GNB (AUC = 0.76) was the best model for predicting strokes, and the ten most important variables were the time taken for repeated chair stands, the chair height from floor to seat, knee height, creatinine, complete repeated chair stands, mean corpuscular volume, platelet, uric acid, body mass index, and white blood cell. In the rural subgroup, LR and GNB (AUC = 0.76) were the best, and the ten most important variables were the time taken for repeated chair stands, creatinine, platelet, the chair height from floor to seat, knee height, complete repeated chair stands, pulse, white blood cell, maintaining semi - tandem balance statically, and uric acid. In the urban subgroup, LR (AUC = 0.67) was the best, and the ten most important variables were the time taken for repeated chair stands, mean corpuscular volume, maintaining semi - tandem balance statically, uric acid, right-hand grip strength, age, blood urea nitrogen, use of trunk, arms, legs for semi - tandem balance, number of marriages, and night sleep duration. The time taken for repeated chair stands was more critical in the stroke risk model for rural individuals. Uric acid and maintaining semi - tandem balance statically were more critical in the stroke risk model for urban individuals. Our results revealed the importance of knee height and physical function predictors for stroke and highlighted the differences in determinants between urban and rural individuals, proposing targeted stroke prevention and control strategies in different populations in terms of physical function.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: article
– Name: Format
  Label: File Description
  Group: SrcInfo
  Data: electronic resource
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 2045-2322
– Name: NoteTitleSource
  Label: Relation
  Group: SrcInfo
  Data: https://doaj.org/toc/2045-2322
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1038/s41598-025-91157-y
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="https://doaj.org/article/efc40e9819ab453e86d240501a5f4458" linkWindow="_blank">https://doaj.org/article/efc40e9819ab453e86d240501a5f4458</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsdoj.fc40e9819ab453e86d240501a5f4458
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsdoj&AN=edsdoj.fc40e9819ab453e86d240501a5f4458
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1038/s41598-025-91157-y
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 9
        StartPage: 1
    Subjects:
      – SubjectFull: Stroke
        Type: general
      – SubjectFull: Prediction
        Type: general
      – SubjectFull: Machine learning
        Type: general
      – SubjectFull: Urban and rural disparities
        Type: general
      – SubjectFull: Middle-aged and elderly adults
        Type: general
      – SubjectFull: Medicine
        Type: general
      – SubjectFull: Science
        Type: general
    Titles:
      – TitleFull: Urban and rural disparities in stroke prediction using machine learning among Chinese older adults
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Jingjing Zhu
      – PersonEntity:
          Name:
            NameFull: Luotao Lin
      – PersonEntity:
          Name:
            NameFull: Lei Si
      – PersonEntity:
          Name:
            NameFull: Hailei Zhao
      – PersonEntity:
          Name:
            NameFull: Hualing Song
      – PersonEntity:
          Name:
            NameFull: Xianglong Xu
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 02
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 20452322
          Numbering:
            – Type: volume
              Value: 15
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Scientific Reports
              Type: main
ResultId 1