Machine Learning Analysis of Retrospective Data From 503 Hospitalized Older Patients With Type 2 Diabetes to Identify Factors Associated With Cognitive Impairment

Mingzhu Yu; Jianfeng Zhang; Haigeng Chen; Guiyue Li

doi:10.12659/MSM.949864

10 January 2026: Database Analysis

Machine Learning Analysis of Retrospective Data From 503 Hospitalized Older Patients With Type 2 Diabetes to Identify Factors Associated With Cognitive Impairment

Mingzhu Yu^{ABCDEFG 1,2*}, Jianfeng Zhang^{A 1}, Haigeng Chen^{A 3}, Guiyue Li^{A 4}

DOI: 10.12659/MSM.949864

Med Sci Monit 2026; 32:e949864

Authors information Article notes Copyright and License information

0 Comments

Add Comment

Abstract

0:00

BACKGROUND: Diabetes is increasingly prevalent among older adults; mild cognitive impairment (MCI) comorbidity in this group represents a major concern. Existing MCI prediction methods are often inaccurate, but machine learning (ML) offers improved potential. This study aimed to identify factors associated with MCI through ML analysis of retrospective data from hospitalized older patients with type 2 diabetes mellitus (T2DM).

MATERIAL AND METHODS: This retrospective study analyzed data from 503 inpatients older than 60 years with T2DM. Patients were classified into MCI (n=102) and normal (n=401) groups based on Mini-Mental State Examination scores. To minimize overfitting and maximize data utilization, 5-fold cross-validation was used for model training and evaluation. Least absolute shrinkage and selection operator regression identified 8 core predictors from clinical data. Logistic regression, eXtreme Gradient Boosting (XGBoost), and random forest algorithms were employed to construct predictive models. Receiver operating characteristic (ROC) curves were used to compare model performance.

RESULTS: Key predictors of early MCI included age, body mass index, glycated hemoglobin, C-reactive protein, waist-to-height ratio, presence of diabetic complications, diabetes duration exceeding 5 years, and low education level. The XGBoost model outperformed other algorithms in ROC analysis: area under the curve, 0.892±0.032; accuracy, 0.851±0.028; sensitivity, 0.843±0.031; specificity, 0.859±0.029; and F1 score, 0.834±0.033.

CONCLUSIONS: The XGBoost model, incorporating these identified factors, demonstrated optimal predictive performance for MCI in older patients with T2DM. It may aid clinical risk stratification and provide a quantitative foundation for early intervention.

Keywords: Cognitive Dysfunction, Diabetes Mellitus, Type 2, machine learning, Humans, Female, Retrospective Studies, Male, Aged, Boosting Machine Learning Algorithms, Predictive Learning Models, ROC Curve, Risk Factors, Logistic Models, Middle Aged, Classification Algorithms, random forest, prediction algorithms, Hospitalization, Aged, 80 and over, Algorithms, Data Analytics

Introduction

Older patients with diabetes face the dual burdens of metabolic dysfunction and an elevated risk of cognitive decline in the context of global aging. The prevalence of diabetes exceeds 20% among adults older than 60 years; this population has a 2- to 3-fold higher risk of cognitive decline, particularly mild cognitive impairment (MCI), among individuals with type 2 diabetes mellitus (T2DM) [1]. Chronic hyperglycemia accelerates cognitive deterioration through mechanisms such as cerebrovascular endothelial injury, neuronal apoptosis, and neuroinflammation [2], establishing these patients as a high-risk cohort for MCI. The association between T2DM and cognitive impairment is well established, involving pathways that include chronic hyperglycemia, insulin resistance, and vascular injury.

MCI, a potentially reversible pre-dementia stage, affects approximately 35% of older patients with diabetes – 2.5-fold greater than the prevalence in non-diabetic older adults [3]. Its pathogenesis involves multifactorial interactions among metabolic disturbances, inflammatory pathways, and multidimensional modifiers [4–9]. Previous studies have attempted to develop predictive models for MCI in patients with T2DM using various methodologies. For example, Maimaitituerxun et al utilized a chi-squared automatic interaction detection (CHAID) decision tree to identify predictors [10], whereas Xu et al focused on biomarker discovery for early diagnosis [11]. Conventional risk models based on isolated clinical markers have shown limited predictive accuracy (area under the curve [AUC] <0.75) because they fail to consider complex variable interactions [12].

Machine learning (ML) offers a transformative framework to overcome these limitations. Unlike conventional univariate approaches, ML algorithms can automatically detect nonlinear relationships and multivariate interactions in high-dimensional data [13]. The ML methods used in this study – including least absolute shrinkage and selection operator (LASSO) regression for feature selection and eXtreme Gradient Boosting (XGBoost) for predictive modeling – were selected for their ability to manage complex data structures and capture nonlinear associations. As noted in Lin’s 2024 review in Medinformatics, these capabilities position artificial intelligence, particularly ML, as a powerful tool in medical informatics, capable of surpassing the constraints of existing analytical methods [14]. This approach has already enhanced predictive performance in multiple clinical domains, including suicide risk assessment, preeclampsia prediction, and disease detection [15–17]. For instance, Aher’s 2023 Medinformatics study demonstrated the efficacy of ML in disease detection through a novel model that achieved high accuracy and sensitivity in identifying heart disease [18]. By applying such advanced algorithms, medical research and practice can achieve deeper insights and more favorable patient outcomes across diverse conditions.

Therefore, this study aimed to identify factors associated with MCI through ML analysis of retrospective data from hospitalized older patients with T2DM.

Note: We acknowledge a reviewer’s suggestion to include an older adult control group without diabetes or dementia. However, this study exclusively enrolled patients from the Department of Endocrinology, which specializes in diabetes management. Because the hospital database lacked retrospective data for “older adults without diabetes or dementia,” the inclusion of such a control group was not feasible. A matched control group will be incorporated in future prospective studies.

Material and Methods

ETHICAL APPROVAL:

This study was approved by the Medical Ethics Committee of the First Affiliated Hospital of Anhui University of Science and Technology (Approval No.: 2025-KY-Y014-001). The study was conducted in accordance with the principles of the Declaration of Helsinki. The requirement for informed consent was waived because of the retrospective design.

STUDY DESIGN AND PARTICIPANTS:

This retrospective study included 503 inpatients older than 60 years with T2DM who received treatment at the First Affiliated Hospital of Anhui University of Science and Technology between January 2019 and January 2024.

INCLUSION AND EXCLUSION CRITERIA:

Inclusion criteria were as follows: (1) age greater than 60 years; (2) confirmed T2DM diagnosis according to standard guidelines [19]; and (3) availability of Mini-Mental State Examination (MMSE) assessment [20]. Exclusion criteria were as follows: (1) type 1 diabetes; (2) impaired consciousness or severe sensory deficits preventing MMSE completion; (3) history of epilepsy, Parkinson’s disease, Alzheimer’s disease, or other neurodegenerative disorders; (4) severe cardiopulmonary, hepatic, or renal insufficiency; (5) malignancy, autoimmune disease, active infection, or recent surgery (≤ 6 months); and (6) pregnancy or lactation.

DATA COLLECTION:

A standardized Case Report Form was used to collect data regarding 17 variables: demographic and lifestyle factors (age, sex, education level, occupation, smoking or drinking status, body mass index [BMI], waist-to-height ratio, waist-to-hip ratio, systolic and diastolic blood pressure, and diabetes duration) and laboratory parameters (triglycerides, total cholesterol, low-density lipoprotein cholesterol, serum creatinine, C-reactive protein [CRP], and glycated hemoglobin [HbA1c]).

All data were collected by trained researchers via standardized protocols.

HANDLING OF MISSING DATA:

Missing values were addressed by excluding cases with incomplete data for any of the 17 analyzed variables. Forty-seven cases were excluded for this reason, resulting in a final cohort of 503 cases. This approach ensured a complete-case analysis, which is appropriate for ML models that require full feature sets.

STUDY PROCEDURES:

To mitigate the effects of a relatively small sample size and reduce the risk of overfitting that may occur during ML with small datasets, the original 70/30 training–validation split was modified to 5-fold cross-validation. All 503 samples were divided into 5 equal subsets. In each iteration, 4 subsets were used for model training and hyperparameter optimization; the remaining subset was used for performance evaluation. This procedure was repeated 5 times to ensure that all samples were included in both training and validation phases. Final model performance metrics were reported as mean±standard deviation (e.g., XGBoost AUC=0.892±0.032). Model efficacy was evaluated via receiver operating characteristic curve analysis: AUC, accuracy, sensitivity, specificity, and F1 score.

STATISTICAL ANALYSIS:

Statistical analyses were performed using SPSS version 26.0 and R version 3.6.3. All tests were 2-sided, and P-values <0.05 were considered statistically significant. Categorical variables were expressed as counts (percentages) and compared using the chi-squared test. Normally distributed continuous variables were presented as mean±standard deviation and analyzed using Student’s t-test; non-normally distributed variables were expressed as median (interquartile range) and analyzed using the Mann-Whitney U test. LASSO regression, implemented with the R package glmnet, was utilized for dimensionality reduction and predictor selection. LASSO regression parameters were optimized via 10-fold cross-validation [21].

SAMPLE SIZE JUSTIFICATION:

This retrospective study included all eligible cases of older patients with diabetes who met the inclusion criteria at the First Affiliated Hospital of Anhui University of Science and Technology between 2019 and 2024 (n=503). The inclusion criteria were: (1) age greater than 60 years; (2) diagnosis of diabetes based on World Health Organization criteria; and (3) availability of complete clinical data, including cognitive function assessment results.

Statistical power analysis confirmed that the sample size met the analytical requirements of ML models such as LASSO regression and XGBoost. This study included 17 predictive variables (e.g., age, disease duration, HbA1c level), and the sample size of 503 cases was approximately 30 times the number of variables – well above the standard proposed by Monti et al [22], which recommends a sample size 10 to 20 times the number of variables to ensure model stability. The study included 102 cases of MCI and 401 cases in the normal control group, yielding a ratio of 1: 4. This distribution met the requirement described by Chiang et al [23] that “the sample size should cover the minimum number of events required by the model and allow for analysis of variable interactions,” while also satisfying the XGBoost model’s basic need for balanced positive and negative samples. Internal validation through 5-fold cross-validation demonstrated minimal performance variation in the XGBoost model (AUC mean±standard deviation=0.892±0.032), meeting the model stability criteria proposed by McClure et al [24]. Furthermore, the sample size exceeded that of similar retrospective studies, such as Ma et al [25], reinforcing the robustness of the results.

ML MODELS:

Logistic regression, XGBoost (via R package xgboost), and random forest (via R package randomForest) models were developed using R. Age was included as a mandatory core predictor in all models to control its potential confounding effect on the association between diabetes-related factors and MCI risk. Receiver operating characteristic curves were generated using the R package pROC to evaluate model performance. The SHapley Additive exPlanations (SHAP) framework (via R package SHAPforxgboost) was utilized to quantify feature importance in the XGBoost model. A nomogram was constructed using the R package rms to visualize the final predictive model. The overall study design is illustrated in Figure 1.

Results

UNIVARIATE ANALYSIS OF DIABETES AND DIABETES WITH MCI:

In total, 503 patients were enrolled, including 401 in the DM group and 102 in the DMMCI group (incidence: 20.28%). The DM group comprised 206 men (51.4%) and 195 women (48.6%), whereas the DMMCI group encompassed 47 men (46.1%) and 55 women (53.9%). The mean age was significantly older in the DMMCI group than in the DM group [(80.11±5.18) vs (74.63±5.05) years; P<0.001].

Significant differences between the 2 groups were observed in age, BMI, HbA1c, CRP, waist-to-height ratio, presence of diabetic complications, diabetes duration exceeding 5 years, and low education level (P<0.001 for all). No significant differences were found in systolic or diastolic blood pressure, low-density lipoprotein cholesterol, total cholesterol, triglycerides, creatinine, waist-to-hip ratio, or smoking and drinking history (P>0.05; Table 1).

SCREENING OF PREDICTIVE FACTORS FOR MCI IN OLDER PATIENTS WITH DIABETES USING LASSO REGRESSION:

Dimensionality reduction via LASSO regression identified 8 core predictive indicators – age, BMI, HbA1c, CRP, waist-to-height ratio, presence of diabetic complications, diabetes duration exceeding 5 years, and low education level – among 17 variables. These factors were used to construct the MCI prediction model for older patients with diabetes (Figure 2).

CONSTRUCTION OF PREDICTION MODELS USING 3 ML CLASSIFIERS:

To mitigate the limitations of a small sample size and reduce overfitting, 5-fold cross-validation was used to train and evaluate all models (XGBoost, random forest, and logistic regression) across the full dataset of 503 samples. This method ensured that all data were included in both training and validation phases; performance metrics were reported as mean±standard deviation values to demonstrate model robustness.

Among the models, XGBoost achieved the best overall performance, with an AUC of 0.892±0.032. The random forest model showed less robust performance (AUC=0.863±0.035) and was followed by the logistic regression model (AUC=0.831±0.038). Detailed performance metrics are presented in Table 2.

The receiver operating characteristic curves of the models after 5-fold cross-validation (Figure 3A) confirmed the superior discriminative ability of XGBoost, as indicated by its more outwardly positioned curve. Comparison of key performance metrics (Figure 3B) showed that XGBoost outperformed the other models across all indicators, with small standard deviations (≤0.038), reflecting low variability and high stability of the cross-validation results.

SUPPLEMENTARY MULTIVARIABLE LOGISTIC REGRESSION ANALYSIS:

To enhance the interpretability of the findings and simplify our statistical approach, we conducted multivariable logistic regression using the same 8 core predictors identified via LASSO regression (age, BMI, HbA1c, CRP, waist-to-height ratio, diabetic complications, diabetes duration exceeding 5 years, and low education level). The logistic regression model demonstrated stable performance (AUC=0.831±0.038), consistent with the 5-fold cross-validation results presented in Table 2.

The multivariable logistic regression results indicated that age (odds ratio [OR]=1.12, 95% confidence interval [CI]: 1.07–1.18, P < 0.001) and diabetic complications (OR=3.50, 95% CI: 1.80–6.82, P<0.001) were the strongest independent risk factors for MCI in older patients with diabetes. Other statistically significant predictors included HbA1c (OR=1.13, 95% CI: 1.04–1.24, P=0.007) and CRP (OR=1.08, 95% CI: 1.02–1.15, P=0.012). Waist-to-height ratio (OR=1.20, 95% CI: 0.95–1.52, P=0.125) and low education level (OR=0.75, 95% CI: 0.55–0.98, P=0.050) showed marginal associations with MCI risk; BMI (OR=1.05, 95% CI: 1.01–1.09, P=0.016) and diabetes duration exceeding 5 years (OR=2.10, 95% CI: 1.20–3.68, P=0.010) were confirmed as independent risk factors (Table 3).

The ranking of factor importance derived from logistic regression (age >diabetic complications >diabetes duration >HbA1c >CRP >BMI >waist-to-height ratio >low education level) was highly consistent with the feature importance ranking from the XGBoost model (Figure 4A), corroborating the robustness and reliability of the core findings.

FEATURE IMPORTANCE OF PREDICTIVE FACTORS ACCORDING TO THE SHAP ALGORITHM:

The SHAP algorithm was used to assess the relative importance of each factor in the constructed prediction model. Importance rankings, based on mean absolute SHAP values, were as follows: age (mean SHAP=0.42) >diabetic complications (mean SHAP=0.38) >diabetes duration exceeding 5 years (mean SHAP=0.29) >HbA1c (mean SHAP=0.25) >CRP (mean SHAP=0.21) >waist-to-height ratio (mean SHAP=0.18) >BMI (mean SHAP=0.15) >low education level (mean SHAP=0.12) (Figure 4A). Features such as age and diabetic complications exerted a greater influence on the model output; education level and BMI had relatively smaller effects. A negative SHAP value indicated that the feature contributed to a lower predicted probability of MCI, whereas a positive SHAP value indicated a higher risk. The relationship between SHAP values and feature values showed that, for variables such as age and diabetic complications, higher feature values generally had a positive impact on model output. For some features, both the direction and magnitude of their influence on model predictions varied according to ranges of values, suggesting that distinct features exerted differential and context-dependent effects within the predictive model (Figure 4B).

NOMOGRAM FOR PREDICTING MCI IN OLDER PATIENTS WITH DIABETES:

Eight predictive factors were used to construct a nomogram that demonstrated strong discriminatory capability in estimating MCI risk among patients with diabetes. Each variable – diabetic complications, disease duration exceeding 5 years, education level, HbA1c, CRP, BMI, waist-to-height ratio, and age – was assigned an individual scale axis. By aligning each factor value with the “Points Axis” at the top of the nomogram, factor-specific scores could be obtained; these scores were summed to generate a total score for overall risk estimation. For instance, in a 62-year-old patient with diabetes and coexisting complications, disease duration exceeding 5 years, medium education level, HbA1c level 8.5%, CRP concentration 3.2 mg/L, BMI 26.3 kg/m2, and waist-to-height ratio 0.58, the corresponding scores were approximately 40, 30, 20, 15, 10, 5, and 25 points. The total score of 155 points reflected an estimated 55% risk of MCI, as shown on the probability scale at the bottom of the nomogram (Figure 5). The ORs and 95% CIs of all 8 predictors are presented in a forest plot (Figure 6), which illustrates both the direction and magnitude of each factor’s association with MCI risk.

Discussion

SUMMARY OF FINDINGS:

This study identified 8 major predictors of MCI in older patients with T2DM and developed an XGBoost-based prediction model with excellent performance (AUC=0.892). The results are consistent with findings by Maimaitituerxun et al [10], who emphasized the importance of age and education using a decision tree approach; they expand upon the work of Xu et al [11] by utilizing an ML framework to identify predictive factors in a clinical dataset.

ML plays a critical role in the study of diabetes-related MCI. It enables personalized treatment by incorporating patient-specific characteristics, as demonstrated in rheumatoid arthritis research [26]; facilitates early MCI risk prediction through multi-source data integration, as revealed in studies of brain metastasis and anticancer drug response [27,28]; and supports tracking of cognitive changes through continuous assessment. These capabilities align with principles of cognitive evaluation and diabetes management, enhancing the effectiveness of MCI care [7,29–31].

BENEFITS OF PREDICTING MCI IN PATIENTS WITH DIABETES THROUGH INTEGRATION OF MULTIPLE ALGORITHMS:

In this study, the XGBoost algorithm utilized parallel computation to accelerate model training and capture nonlinear relationships between metabolic and inflammatory indicators and MCI. This approach parallels the work of Sulaiman et al [32], who used ML to identify nonlinear associations between complex surgical factors and outcomes after transcatheter aortic valve implantation. The random forest algorithm demonstrated robustness to data noise through its unique sampling mechanism and effectively identified potential risk factors, similar to the findings of Amanollahi et al [33], who extracted key predictors of bipolar disorder from complex datasets. This capability enables clinicians to identify high-risk features through assessments of feature importance. Logistic regression provided an intuitive quantification of the relationship between risk factors and the probability of MCI onset by outputting probabilities, consistent with the approach used by Hou et al [34] to develop a predictive model for depressive risk in patients with coronary heart disease, thus offering a clear framework for clinical prediction.

CLINICAL TRANSLATION CHALLENGES OF THE PREDICTION MODEL FOR MCI IN PATIENTS WITH DIABETES:

Although our model effectively predicted MCI, several practical challenges remain. Both XGBoost and random forest exhibit “black box” characteristics, making it difficult for clinicians to interpret their decision processes and reducing clinical trust in their predictions – a limitation also noted by Piedimonte et al [35] when modeling treatment response for advanced ovarian cancer. In contrast, logistic regression is constrained by its linear assumptions and cannot fully represent the complex interactions among metabolic and neural systems involved in MCI onset among patients with diabetes. This limitation contributes to lower predictive accuracy, consistent with findings by Ran et al [36], who observed that rigid linear models fail to capture real-world complexity – a challenge addressed in the present study through XGBoost’s ability to model nonlinear relationships between metabolic-inflammatory indicators and MCI risk. Similar interpretability–accuracy dilemmas were observed by Zhao et al [37] when predicting depressive risk in patients with chronic obstructive pulmonary disease, where linear models were unable to capture multidimensional interactions. Other key factors hindering model implementation include ensuring balance between model accuracy and interpretability, as well as optimizing clinical data quality [38,39].

ENHANCING THE PREDICTION MODEL FOR MCI IN PATIENTS WITH DIABETES VIA MULTIDIMENSIONAL OPTIMIZATION:

Future research can advance in multiple directions. First, following the approach implemented by Mohammadzadeh et al [40] to predict survival among patients with high-grade glioma, a model fusion strategy can be adopted to integrate the strengths of multiple algorithms – for example, using the stacking method to combine the predictive power of XGBoost with the interpretability of logistic regression. Second, drawing on the experience of Tądel et al [41] in sepsis risk modeling, interpretable ML techniques such as SHAP can be introduced to enhance model transparency and assist clinicians with understanding the contributions of individual features to MCI prediction. Our SHAP analysis (Figure 4) already provides this interpretability, demonstrating that negative SHAP values indicate lower risk and positive values indicate higher risk. Third, the integration of metabolic (HbA1c), inflammatory (CRP), and behavioral (education level) features in our model aligns with the findings of Deng et al [42], who reported that multimodal fusion of high-dimensional data improves model robustness. Such improvement supports the strong performance of our XGBoost model (AUC=0.892±0.032). Fourth, building on the methods of Bai et al [43], who predicted heart failure risk in older patients with diabetes and hypertension, novel composite indicators – such as blood glucose stability and inflammatory-metabolic composite indices – can be constructed using medical knowledge and data mining to optimize feature engineering. Finally, in line with the precision medicine framework proposed by Ceccato et al [44] for inflammatory bowel disease, collaboration mechanisms between medical and engineering professionals should be established. Prospective studies and visual clinical interfaces should be developed to comprehensively enhance model performance and clinical applicability.

LIMITATIONS:

This study had some limitations. First, the sample size was limited, and data were collected from a single center, which may restrict model generalizability. Second, the model was developed using data from a Chinese population, potentially limiting its applicability to other ethnic groups. Third, some potentially relevant variables (e.g., apolipoprotein E genotype, detailed nutritional status, physical activity level, and specific medication history) were not included in the study due to data availability constraints. Fourth, external validation was not conducted. Finally, only a few ML algorithms were evaluated. Future large-scale, multicenter, and multiethnic studies are needed to further validate and refine the model.

Conclusions

Age, BMI, HbA1c, CRP, waist-to-height ratio, presence of diabetic complications, diabetes duration exceeding 5 years, and low education level constituted key predictive factors for MCI in older patients with diabetes. The XGBoost ML model constructed in this study demonstrated strong predictive performance and clinical applicability. It can assist clinicians with early identification of individuals with high MCI risk among older patients with diabetes, providing a valuable reference for timely diagnostic and therapeutic decision-making, as well as reducing MCI incidence.

Figures

Flowchart of the study. LASSO – least absolute shrinkage and selection operator; XGBoost – eXtreme Gradient Boosting; AUC – area under the curve. (Figure created using Microsoft PowerPoint, Version 2402; Microsoft Corporation, Redmond, WA, USA).

Figure 1. Flowchart of the study. LASSO – least absolute shrinkage and selection operator; XGBoost – eXtreme Gradient Boosting; AUC – area under the curve. (Figure created using Microsoft PowerPoint, Version 2402; Microsoft Corporation, Redmond, WA, USA).

Coefficient diagram and adjustment in the LASSO regression model. LASSO – least absolute shrinkage and selection operator. (Figure created using R software, version 3.6.3; R Foundation for Statistical Computing, Vienna, Austria; package: glmnet).

Figure 2. Coefficient diagram and adjustment in the LASSO regression model. LASSO – least absolute shrinkage and selection operator. (Figure created using R software, version 3.6.3; R Foundation for Statistical Computing, Vienna, Austria; package: glmnet).

(A) Model ROC curves after 5-fold cross-validation. (B) Comparison of model performance metrics. ROC – receiver operating characteristic; AUC – area under the curve; XGBoost – eXtreme Gradient Boosting. (Figure created using R software, version 3.6.3; package: ggplot2).

Figure 3. (A) Model ROC curves after 5-fold cross-validation. (B) Comparison of model performance metrics. ROC – receiver operating characteristic; AUC – area under the curve; XGBoost – eXtreme Gradient Boosting. (Figure created using R software, version 3.6.3; package: ggplot2).

(A) SHAP value plot of the XGBoost machine learning model. (B) Corresponding SHAP value dependence plot. SHAP – SHapley Additive exPlanations; XGBoost – eXtreme Gradient Boosting; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: SHAPforxgboost).

Figure 4. (A) SHAP value plot of the XGBoost machine learning model. (B) Corresponding SHAP value dependence plot. SHAP – SHapley Additive exPlanations; XGBoost – eXtreme Gradient Boosting; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: SHAPforxgboost).

Nomogram of the prediction model for cognitive impairment in older patients with diabetes. Abbreviations: DMMCI, diabetes mellitus with mild cognitive impairment. (Figure created using R software, version 3.6.3; package: rms).

Figure 5. Nomogram of the prediction model for cognitive impairment in older patients with diabetes. Abbreviations: DMMCI, diabetes mellitus with mild cognitive impairment. (Figure created using R software, version 3.6.3; package: rms).

Forest plot of odds ratios for MCI risk factors (logistic regression). MCI – mild cognitive impairment; OR – odds ratio; CI – confidence interval; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: forestplot).

Figure 6. Forest plot of odds ratios for MCI risk factors (logistic regression). MCI – mild cognitive impairment; OR – odds ratio; CI – confidence interval; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: forestplot).

Tables

Table 1. Comparative analysis of clinical profiles between patients with diabetes mellitus and those with diabetes mellitus with mild cognitive impairment.

Comparative analysis of clinical profiles between patients with diabetes mellitus and those with diabetes mellitus with mild cognitive impairment.

Table 2. Model performance metrics after 5-fold cross-validation.

Table 3. Multivariable logistic regression analysis of risk factors for mild cognitive impairment.

References

1. You Y, Liu Z, Chen Y, The prevalence of mild cognitive impairment in type 2 diabetes mellitus patients: A systematic review and meta-analysis: Acta Diabetol, 2021; 58(6); 671-85

2. Liu H, Chen J, Ling J, The association between diabetes mellitus and postoperative cognitive dysfunction: A systematic review and meta-analysis: Int J Surg, 2025; 111(3); 2633-50

3. Yingxu L, Tan X, Fangyi L, Risk factors for mild cognitive impairment in type 2 diabetes mellitus older adult: A systematic review and meta-analysis: J Psychiatr Res, 2025; 186; 445-57

4. Gonzales PNG, Ampil ER, Catindig-Dela Rosa JS, Increased risk of Alzheimer’s disease with glycemic variability: A systematic review and meta-analysis: Cureus, 2024; 16(11); e73353

5. Soleymani Y, Batouli SAH, Ahangar AA, Association of glycosylated hemoglobin concentrations with structural and functional brain changes in the normoglycemic population: A systematic review: J Neuroendocrinol, 2024; 36(11); e13437

6. Daneshvar S, Moradi F, Rahmani M, Association of serum levels of inflammation and oxidative stress markers with cognitive outcomes in multiple sclerosis; A systematic review: J Clin Neurosci, 2025; 132; 110990

7. He CYY, Zhou Z, Kan MMP, Modifiable risk factors for mild cognitive impairment among cognitively normal community-dwelling older adults: A systematic review and meta-analysis: Ageing Res Rev, 2024; 99; 102350

8. Qiu W, Zhang Y, Modifiable risk factors for cognitive frailty in older Chinese patients with diabetes: A systematic review and meta-analysis: Res Nurs Health, 2025; 48(1); 73-84

9. Wei J, Zhu X, Liu J, Estimating global prevalence of mild cognitive impairment and dementia in elderly with overweight, obesity, and central obesity: A systematic review and meta-analysis: Obes Rev, 2025; 26(5); e13882

10. Maimaitituerxun R, Chen S, Ban C, Risk factors for mild cognitive impairment in patients with type 2 diabetes mellitus: A retrospective study: Neurol Res, 2023; 45(4); 331-39

11. Xu W, Chen S, Feng L, Biomarker discovery for early diagnosis of mild cognitive impairment in type 2 diabetes mellitus based on metabolomics: J Integr Neurosci, 2022; 21(5); 132

12. van der Heijden AA, Nijpels G, Badloe F, Prediction models for development of retinopathy in people with type 2 diabetes: Systematic review and external validation in a Dutch primary care setting: Diabetologia, 2020; 63(6); 1110-19

13. Guo P, Tu Y, Liu R, Performance of risk prediction models for diabetic foot ulcer: A meta-analysis: PeerJ, 2024; 12; e17770

14. Lin H, Artificial intelligence with great potential in medical informatics: A brief review: Medinformatics, 2024; 1(1); 2-9

15. Ehtemam H, Sadeghi Esfahlani S, Sanaei A, Role of machine learning algorithms in suicide risk prediction: A systematic review-meta analysis of clinical studies: BMC Med Inform Decis Mak, 2024; 24(1); 138

16. Dalil D, Esmaeili S, Safaee E, The prediction of venous thromboembolism using artificial intelligence and machine learning in lower extremity arthroplasty: A systematic review: Arthroplast Today, 2025; 33; 101672

17. Malik V, Agrawal N, Prasad S, Prediction of preeclampsia using machine learning: A systematic review: Cureus, 2024; 16(12); e76095

18. Aher CN, Enhancing heart disease detection using political deer hunting optimization-based deep Q-network with high accuracy and sensitivity: Medinformatics, 2023 Avaiable from: https://ojs.bonviewpress.com/index.php/MEDIN/article/view/1377

19. Zhao WG, Interpretation of the Chinese clinical guidelines for the prevention and treatment of type 2 diabetes mellitus in the elderly (2022 edition): J Peking Union Med Coll Hosp, 2022; 13(4); 574-80

20. Zhang YP, Zhu QR, Analysis of incidence and influencing factors of mild cognitive impairment in elderly patients with cerebral infarction and memory disorders: Guizhou Med J, 2022; 46(11); 1770-71

21. Joshi M, Singh BK, Deep learning techniques for brain lesion classification using various MRI (from 2010 to 2022): Review and challenges: Medinformatics, 2024 bonviewMEDIN42021686

22. Monti CB, Ambrogi F, Sardanelli F, Sample size calculation for data reliability and diagnostic performance: A go-to review: Eur Radiol Exp, 2024; 8(1); 79

23. Chiang AY, Guth BD, Pugsley MK, The evaluation of endpoint variability and implications for study statistical power and sample size in conscious instrumented dogs: J Pharmacol Toxicol Methods, 2018; 92; 43-51

24. McClure LA, Szychowski JM, Benavente O, A post hoc evaluation of a sample size re-estimation in the Secondary Prevention of Small Subcortical Strokes study: Clin Trials, 2016; 13(5); 537-44

25. Maimaitituerxun R, Chen W, Xiang J, Identifying a prediction model for mild cognitive impairment in type 2 diabetes mellitus: A CHAID decision tree analysis: Brain Behav, 2024; 14(3); e3456

26. Xu ZP, Yang SL, Zhao S, Biomarkers for early diagnosis of mild cognitive impairment in type 2 diabetes mellitus: A multicenter, retrospective, nested case-control study: EBioMedicine, 2016; 5; 105-13

27. Singh DP, Kaushik B, A systematic literature review for the prediction of anticancer drug response using various machine-learning and deep-learning techniques: Chem Biol Drug Des, 2023; 101(1); 175-94

28. Habibi MA, Rashidi F, Habibzadeh A, Prediction of the treatment response and local failure of patients with brain metastasis treated with stereotactic radiosurgery using machine learning: A systematic review and meta-analysis: Neurosurg Rev, 2024; 47(1); 1999

29. Curtis EM, Miguel M, McEvoy C, Impact of dementia and mild cognitive impairment on bone health in older people: Aging Clin Exp Res, 2024; 37(1); 5

30. Li J, Dan K, Ai J, Machine learning in the prediction of immunotherapy response and prognosis of melanoma: a systematic review and meta-analysis: Front Immunol, 2024; 15; 1281940

31. Młynarska E, Czarnik W, Dzieża N, Type 2 diabetes mellitus: New pathogenetic mechanisms, treatment and the most important complications: Int J Mol Sci, 2025; 26(3); 1094

32. Sulaiman R, Atick Faisal MA, Hasan M, Machine learning for predicting outcomes of transcatheter aortic valve implantation: A systematic review: Int J Med Inform, 2025; 197; 105840

33. Amanollahi M, Jameie M, Looha MA, Machine learning applied to the prediction of relapse, hospitalization, and suicide in bipolar disorder using neuroimaging and clinical data: A systematic review: J Affect Disord, 2024; 361; 778-97

34. Hou XZ, Wu Q, Lv QY, Development and external validation of a risk prediction model for depression in patients with coronary heart disease: J Affect Disord, 2024; 367; 137-47

35. Piedimonte S, Mohamed M, Rosa G, Predicting response to treatment and survival in advanced ovarian cancer using machine learning and radiomics: A systematic review: Cancers (Basel), 2025; 17(3); 336

36. Ran X, Suyaroj N, Tepsan W, A novel fuzzy system-based genetic algorithm for trajectory segment generation in urban global positioning system: J Adv Res, 2025 [Online ahead of print]

37. Zhao X, Wang Y, Li J, A machine-learning-derived online prediction model for depression risk in COPD patients: A retrospective cohort study from CHARLS: J Affect Disord, 2025; 377; 284-93

38. Michalitsi K, Metallinou D, Diamanti A, Artificial intelligence in predicting the mode of delivery: A systematic review: Cureus, 2024; 16(9); e69115

39. Sulague RM, Beloy FJ, Medina JR, Artificial intelligence in cardiac surgery: A systematic review: World J Surg, 2024; 48(9); 2073-89

40. Mohammadzadeh I, Hajikarimloo B, Niroomand B, Application of artificial intelligence in forecasting survival in high-grade glioma: Systematic review and meta-analysis involving 79,638 participants: Neurosurg Rev, 2025; 48(1); 240

41. Tądel K, Dudek A, Bil-Lula I, AI algorithms for modeling the risk, progression, and treatment of sepsis, including early-onset sepsis – A systematic review: J Clin Med, 2024; 13(19); 5959

42. Deng W, Wang J, Guo A, Quantum differential evolutionary algorithm with quantum-adaptive mutation strategy and population state evaluation framework for high-dimensional problems: Information Sciences, 2024; 676; 120787

43. Bai Q, Chen H, Gao Z, Advanced prediction of heart failure risk in elderly diabetic and hypertensive patients using nine machine learning models and novel composite indices: Insights from NHANES 2003–2016: Eur J Prev Cardiol, 2025 [Online ahead of print]

44. Ceccato HD, Silva T, Genaro LM, Artificial intelligence use for precision medicine in inflammatory bowel disease: A systematic review: Am J Transl Res, 2025; 17(1); 28-46

Introduction Material and Methods Results Discussion Conclusions References

Related articles Order reprints Share article Share by email

Figures

Figure 1. Flowchart of the study. LASSO – least absolute shrinkage and selection operator; XGBoost – eXtreme Gradient Boosting; AUC – area under the curve. (Figure created using Microsoft PowerPoint, Version 2402; Microsoft Corporation, Redmond, WA, USA).

Figure 2. Coefficient diagram and adjustment in the LASSO regression model. LASSO – least absolute shrinkage and selection operator. (Figure created using R software, version 3.6.3; R Foundation for Statistical Computing, Vienna, Austria; package: glmnet).

Figure 3. (A) Model ROC curves after 5-fold cross-validation. (B) Comparison of model performance metrics. ROC – receiver operating characteristic; AUC – area under the curve; XGBoost – eXtreme Gradient Boosting. (Figure created using R software, version 3.6.3; package: ggplot2).

Figure 4. (A) SHAP value plot of the XGBoost machine learning model. (B) Corresponding SHAP value dependence plot. SHAP – SHapley Additive exPlanations; XGBoost – eXtreme Gradient Boosting; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: SHAPforxgboost).

Figure 5. Nomogram of the prediction model for cognitive impairment in older patients with diabetes. Abbreviations: DMMCI, diabetes mellitus with mild cognitive impairment. (Figure created using R software, version 3.6.3; package: rms).

Figure 6. Forest plot of odds ratios for MCI risk factors (logistic regression). MCI – mild cognitive impairment; OR – odds ratio; CI – confidence interval; BMI – body mass index; HbA1c – glycated hemoglobin; CRP – C-reactive protein. (Figure created using R software, version 3.6.3; package: forestplot).

Tables

Table 1. Comparative analysis of clinical profiles between patients with diabetes mellitus and those with diabetes mellitus with mild cognitive impairment.

Table 2. Model performance metrics after 5-fold cross-validation.

Table 3. Multivariable logistic regression analysis of risk factors for mild cognitive impairment.

Table 1. Comparative analysis of clinical profiles between patients with diabetes mellitus and those with diabetes mellitus with mild cognitive impairment.

Table 2. Model performance metrics after 5-fold cross-validation.

Table 3. Multivariable logistic regression analysis of risk factors for mild cognitive impairment.

In Press

Clinical Research
Effect of Dexmedetomidine Hydrochloride Nasal Spray on Anxiety and Sleep in Patients Undergoing Gynecologic...

Med Sci Monit In Press; DOI: 10.12659/MSM.952465

Clinical Research
Prognostic Value of Mortality Scoring Systems in Patients With Severe Burns: Identifying Key Predictors of ...

Med Sci Monit In Press; DOI: 10.12659/MSM.951713

Laboratory Research
Evaluation of the Trueness and Precision of Cast, Milled-Cast, Milled, and 3D-Printed Post-and-Core Techniq...

Med Sci Monit In Press; DOI: 10.12659/MSM.953491

Clinical Research
Outcomes After Minimally Invasive Intramedullary Nail Fixation and Locking Plate Fixation Among Patients Wi...

Med Sci Monit In Press; DOI: 10.12659/MSM.952670

Most Viewed Current Articles

17 Jan 2024 : Review article 14,176,214
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

DOI :10.12659/MSM.942799

Med Sci Monit 2024; 30:e942799

0:00

13 Nov 2021 : Clinical Research 3,757,839
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

DOI :10.12659/MSM.932788

Med Sci Monit 2021; 27:e932788

0:00

14 Dec 2022 : Clinical Research 2,466,153
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

DOI :10.12659/MSM.937990

Med Sci Monit 2022; 28:e937990

0:00

16 May 2023 : Clinical Research 708,809
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

DOI :10.12659/MSM.940387

Med Sci Monit 2023; 29:e940387

0:00

Machine Learning Analysis of Retrospective Data From 503 Hospitalized Older Patients With Type 2 Diabetes to Identify Factors Associated With Cognitive Impairment

Abstract

Introduction

Material and Methods

Results

Discussion

Conclusions

Figures

Tables

References

Figures

Tables

In Press

Clinical Research Effect of Dexmedetomidine Hydrochloride Nasal Spray on Anxiety and Sleep in Patients Undergoing Gynecologic...

Clinical Research Prognostic Value of Mortality Scoring Systems in Patients With Severe Burns: Identifying Key Predictors of ...

Laboratory Research Evaluation of the Trueness and Precision of Cast, Milled-Cast, Milled, and 3D-Printed Post-and-Core Techniq...

Clinical Research Outcomes After Minimally Invasive Intramedullary Nail Fixation and Locking Plate Fixation Among Patients Wi...

Most Viewed Current Articles

17 Jan 2024 : Review article 14,176,214 Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,757,839 Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,466,153 Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,809 Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

Your Privacy

Clinical Research
Effect of Dexmedetomidine Hydrochloride Nasal Spray on Anxiety and Sleep in Patients Undergoing Gynecologic...

Clinical Research
Prognostic Value of Mortality Scoring Systems in Patients With Severe Burns: Identifying Key Predictors of ...

Laboratory Research
Evaluation of the Trueness and Precision of Cast, Milled-Cast, Milled, and 3D-Printed Post-and-Core Techniq...

Clinical Research
Outcomes After Minimally Invasive Intramedullary Nail Fixation and Locking Plate Fixation Among Patients Wi...

17 Jan 2024 : Review article 14,176,214
Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

13 Nov 2021 : Clinical Research 3,757,839
Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

14 Dec 2022 : Clinical Research 2,466,153
Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

16 May 2023 : Clinical Research 708,809
Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...