Logo Medical Science Monitor

Call: +1.631.470.9640
Mon - Fri 10:00 am - 02:00 pm EST

Contact Us

Logo Medical Science Monitor Logo Medical Science Monitor Logo Medical Science Monitor

07 April 2026: Clinical Research  

AI-Assisted Dialysis Decision-Making: Assessing Agreement Between ChatGPT and Nephrologist in Initial Dialysis Indication and Prescription in Emergency and ICU Settings

Ali Veysel Kara ABCDEF 1*, Ersin V. Ozturk BCDE 2, Mahmut Sami Islamoglu BCD 2, Ridvan V. Ozdemir BCD 2, Serhat V. Hayme CD 3

DOI: 10.12659/MSM.951942

Med Sci Monit 2026; 32:e951942

0 Comments

Abstract

0:00

BACKGROUND: Artificial intelligence (AI) is increasingly explored as a clinical decision-support tool in nephrology; however, its real-world applicability for dialysis decision-making in emergency and intensive care unit (ICU) settings remains insufficiently studied. Hemodialysis initiation and prescription are complex, time-sensitive, and dynamic processes that require expert clinical judgment.

MATERIAL AND METHODS: This retrospective observational study evaluated agreement between AI-generated (ChatGPT) and nephrologist-made dialysis decisions in emergency and ICU settings. Adult patients undergoing first-time dialysis were included. Agreement was assessed for dialysis initiation, modality selection, and key prescription parameters. To ensure clinical relevance, continuous prescription variables were categorized into predefined ranges. Agreement was quantified using Gwet’s AC1 coefficient and Cramér’s V statistic.

RESULTS: Eighty-four patients were included. AI demonstrated 100% agreement with nephrologists regarding dialysis initiation. Overall agreement for dialysis modality selection was 92.9% (Cramér’s V=0.87, P<0.001). Agreement for core dialysis prescription parameters – including blood flow rate, dialysate sodium, potassium, and calcium concentrations – was high across modalities (all P<0.001). Lower agreement was observed for ultrafiltration-related parameters, particularly ultrafiltration duration, reflecting the individualized and dynamic nature of volume management during dialysis.

CONCLUSIONS: AI-assisted decision support demonstrated high agreement with nephrologist decisions for initial dialysis initiation, modality selection, and core prescription parameters in emergency and ICU settings. Discrepancies were primarily confined to ultrafiltration-related decisions, underscoring the necessity of ongoing bedside clinical judgment. These findings support the role of AI as a decision-support tool rather than a replacement for clinician-led dialysis management.

Keywords: Artificial Intelligence, Comparative Study, Dialysis, Emergency Medical Services, Intensive Care Units, Nephrology

Introduction

The integration of artificial intelligence (AI) into medical decision-making has accelerated rapidly, driven by its potential to improve clinical accuracy, efficiency, and patient outcomes across multiple medical disciplines [1–3]. Contemporary AI systems are increasingly capable of synthesizing large volumes of clinical data and supporting clinician decision-making in complex, time-sensitive environments.

In nephrology, timely and appropriate decision-making is particularly critical because acute kidney injury (AKI) is common among hospitalized and critically ill patients and is strongly associated with increased mortality, prolonged hospitalization, and higher healthcare costs [4–6]. A substantial proportion of critically ill patients with AKI ultimately require renal replacement therapy (RRT), and delays or inappropriate initiation of RRT can adversely affect clinical outcomes [6–10].

Hemodialysis initiation in emergency department and intensive care unit (ICU) settings is frequently prompted by life-threatening complications such as severe metabolic acidosis, hyperkalemia, uremic manifestations, and fluid overload with pulmonary edema [7,11]. However, dialysis initiation and prescription are complex processes that often must be determined rapidly in hemodynamically unstable patients. These challenges can be further amplified in resource-limited settings, where nephrology workforce shortages restrict timely access to specialist consultation and can contribute to delayed or non-standardized dialysis decisions [12,13].

Recent advances in AI – particularly machine learning techniques applied to large-scale electronic health record data – have demonstrated strong performance in AKI prediction and risk stratification. Deep learning models have been shown to predict AKI trajectories prior to clinical recognition in large, real-world datasets [14]. Beyond prediction, AI is increasingly used as a clinical decision-support tool capable of synthesizing clinical, laboratory, and physiologic data to support standardized decision-making at the point of care [1–3,15].

Despite growing interest, the real-world applicability and safety of AI-generated dialysis recommendations remain insufficiently studied, particularly in emergency and ICU environments. This is an important knowledge gap because hemodialysis is inherently dynamic: prescriptions – especially ultrafiltration – often require continuous reassessment and adjustment based on intradialytic hemodynamic tolerance and evolving clinical status [16–19]. Consequently, the extent to which AI-supported approaches can contribute to early dialysis decision pathways while preserving essential clinician oversight remains uncertain.

Material and Methods

STUDY DESIGN AND SETTING:

This retrospective observational study was conducted at Mengucek Gazi Training and Research Hospital between January 1, 2024, and July 1, 2024. The primary objective was to evaluate the agreement between dialysis decisions generated by an artificial intelligence (AI) model (ChatGPT) and those made by nephrologists in emergency department and intensive care unit (ICU) settings.

A total of 97 patients were initially screened. After applying predefined inclusion and exclusion criteria, 84 patients were included in the final analysis (Figure 1). The study protocol was approved by the Erzincan Binali Yildirim University Non-Interventional Clinical Research Ethics Committee (Meeting No: 16, Decision No: 2024-16/06).

STUDY POPULATION AND INCLUSION CRITERIA:

Adult patients (≥18 years) were eligible for inclusion if they met all of the following criteria:

Patients were excluded if they had a previous history of dialysis (n=5), incomplete medical records or missing clinical data (n=6), or incomplete dialysis prescription details (n=2), as complete datasets were required for reliable AI–nephrologist agreement assessment.

DATA COLLECTION AND STANDARDIZED AI COMPARISON PROCESS:

Patient data were retrospectively extracted from electronic health records (EHRs) and nephrology consultation notes. To ensure standardized data input and objective comparison between AI-generated and nephrologist-made decisions, a structured Hemodialysis Decision and Prescription Data Form was developed a priori and applied uniformly to all patients. This standardized form systematically captured the following information:

The use of this structured data collection framework ensured that identical clinical information was presented to both the nephrologist and the AI model in a consistent format, thereby enabling a fair, reproducible, and unbiased assessment of agreement. The data collection form can be made available for transparency and reproducibility purposes upon reasonable request.

ARTIFICIAL INTELLIGENCE MODEL AND DATA INPUT:

ChatGPT (GPT-4, OpenAI, USA) was used as an AI-based decision-support tool. The model was accessed via the ChatGPT web interface (https://chat.openai.com). No custom training or fine-tuning was applied.

To minimize variability related to potential model updates, all patient data were entered into ChatGPT on the same day using a predefined standardized prompt structure. ChatGPT had no access to nephrologist decisions or outcomes. AI-generated recommendations were recorded independently and compared with nephrologist prescriptions.

STANDARDIZATION AND CLASSIFICATION OF DIALYSIS PRESCRIPTION PARAMETERS:

Dialysis prescription parameters were initially recorded as continuous numerical values. To avoid misclassification of clinically insignificant numerical differences as true disagreements, parameters were categorized into predefined clinically meaningful ranges.

This classification approach ensured that minor numerical variations (eg, dialysate sodium 135 vs 136 mmol/L) were not interpreted as discordant decisions. Dialysis duration and ultrafiltration duration were categorized as ≤3 h or >3 h; ultrafiltration volume as ≤2.0 L or >2.0 L; blood flow rate as ≤200 mL/min or >200 mL/min; and dialysate electrolyte concentrations were grouped according to routinely used clinical ranges in standard hemodialysis practice.This classification approach ensured that minor variations (eg, dialysate sodium 135 vs 136 mmol/L) were not interpreted as discordant decisions.

By implementing this classification system, agreement between AI-generated and nephrologist-made decisions was evaluated in a clinically meaningful manner.

COMPARISON OF NEPHROLOGIST AND AI DECISIONS:

Nephrologist dialysis decisions were extracted from consultation notes and entered into the structured data form by a researcher. Independently, the same patient data were entered into ChatGPT, and AI-generated recommendations were recorded.

Agreement was assessed for:

To ensure data accuracy, all extracted data were independently reviewed by 2 researchers, and discrepancies were resolved by consensus.

HANDLING OF MISSING DATA:

No imputation techniques were applied in this study. Patients with incomplete clinical data, missing laboratory values, or incomplete dialysis prescription details were excluded during the screening process. Only patients with complete datasets were included in the final analysis to ensure reliable and unbiased AI–nephrologist agreement assessment.

CONFOUNDING ASSESSMENT AND STRATIFICATION:

As the primary aim of this study was to assess agreement rather than to evaluate outcomes or establish causality, traditional multivariable adjustment models were not applied.

Potential confounding was addressed through a design-based approach:

This stratified approach was considered appropriate to enhance the robustness and interpretability of agreement analyses.

STATISTICAL ANALYSIS:

Continuous variables are presented as median and interquartile range (IQR), and categorical variables as number and percentage. Baseline characteristics were compared descriptively across dialysis modality groups. Agreement between AI-generated and nephrologist-made decisions was assessed using Gwet’s AC1 coefficient and Cramér’s V coefficient, with statistical significance defined as P<0.05.

ETHICS APPROVAL:

This study was approved by the Erzincan Binali Yildirim University Non-Interventional Clinical Research Ethics Committee (Meeting No: 16, Decision No: 2024-16/06). The study was conducted in accordance with the principles outlined in the Declaration of Helsinki.

INFORMED CONSENT:

As this study was retrospective and involved the review of existing patient records without any direct intervention, informed consent was not required. The ethics committee waived the need for informed consent for this analysis.

Results

STUDY POPULATION AND BASELINE CHARACTERISTICS:

A total of 84 patients met the eligibility criteria and were included in the final analysis (Figure 1). The median age of the study population was 77.0 years (interquartile range [IQR], 70.0–84.0), and 47 patients (56.0%) were female.

Baseline demographic, clinical, and laboratory characteristics of the study population, stratified according to nephrologist-selected dialysis modality (isolated ultrafiltration [UF], hemodialysis without ultrafiltration [HD without UF], and hemodialysis with ultrafiltration [HD+UF]), are presented in Table 1.

There were no statistically significant differences between the groups in terms of age, urine output, arterial pH, sex distribution, or intubation status at consultation (all P>0.05), indicating comparable baseline demographic characteristics and overall clinical severity.

Serum creatinine, blood urea nitrogen, and potassium levels differed significantly across groups (P=0.021, P=0.018, and P=0.035, respectively), consistent with the biochemical indications influencing dialysis modality selection.

The presence of volume overload on imaging showed a marked difference between groups (P<0.001), being most prevalent in patients treated with HD+UF. In addition, heart failure and coronary artery disease were more common in this group (P<0.001 and P=0.02, respectively), whereas other comorbidities were similarly distributed across modalities.

Overall, these differences reflect clinical indication-driven dialysis modality selection rather than baseline imbalance between groups.

AGREEMENT ON DIALYSIS INITIATION:

ChatGPT demonstrated complete agreement with nephrologist decisions regarding dialysis initiation. Dialysis was recommended by both ChatGPT and the consulting nephrologist for all 84 patients, resulting in a 100% agreement for dialysis initiation.

Agreement was defined as concordant binary decisions (dialysis indicated vs not indicated) based on identical clinical and laboratory data provided to both the nephrologist and ChatGPT. As all included patients were prescribed dialysis by nephrologists, no discordant cases were observed.

AGREEMENT ON DIALYSIS MODALITY SELECTION:

Agreement between nephrologist-prescribed and ChatGPT-recommended dialysis modality is presented in Table 2. Overall, ChatGPT demonstrated a 92.9% agreement with nephrologist decisions regarding dialysis modality selection. Cramér’s V analysis revealed a strong association between AI-generated and nephrologist-made decisions (Cramér’s V=0.87, P<0.001).

Among patients prescribed isolated ultrafiltration by the nephrologist (n=5), ChatGPT recommended isolated UF in 4 cases (80%). In 1 case (20%), ChatGPT recommended hemodialysis with ultrafiltration instead. Given the limited sample size, these findings are presented descriptively.

For patients prescribed hemodialysis without ultrafiltration (n=26), ChatGPT provided the same modality recommendation in 23 cases (88.5%). In the remaining 3 cases (11.5%), ChatGPT recommended hemodialysis with ultrafiltration instead. Among patients prescribed hemodialysis with ultrafiltration (n=53), ChatGPT recommended the same modality in 51 cases (96.2%). In 2 cases (3.8%), ChatGPT recommended hemodialysis without ultrafiltration.

HEMODIALYSIS WITHOUT ULTRAFILTRATION: Agreement in dialysis prescription parameters among patients treated with hemodialysis without ultrafiltration is summarized in Table 3. Overall agreement was high across all parameters. Complete agreement was observed for dialysis surface area and dialysate sodium concentration. Strong agreement was also noted for dialysis duration, blood flow rate, potassium, and calcium concentrations, with Gwet’s AC1 coefficients indicating substantial to almost perfect agreement (all P<0.001).

HEMODIALYSIS WITH ULTRAFILTRATION: For patients treated with hemodialysis with ultrafiltration (n=53), agreement in prescription parameters is summarized in Table 4. ChatGPT demonstrated high agreement with nephrologist prescriptions across all evaluated parameters. The highest agreement rates were observed for blood flow rate and dialysate sodium levels (98%). Agreement for dialysis duration, ultrafiltration amount, dialysis surface area, potassium, and calcium levels remained high.

Ultrafiltration duration showed the lowest agreement (82.3%), reflecting the dynamic and individualized nature of volume management during dialysis. Nevertheless, overall agreement remained strong, with Gwet’s AC1 coefficients confirming statistically significant reliability across all parameters (P<0.001).

SUMMARY OF AGREEMENT PATTERNS:

Across all analyses, AI-assisted decision support closely ageed with nephrologist decisions on dialysis initiation, modality selection, and core prescription parameters. Variability was largely confined to ultrafiltration-related decisions, particularly ultrafiltration duration, whereas parameters such as blood flow rate and dialysate sodium concentration showed near-complete agreement.

Discussion

Even when an AI-generated dialysis prescription agrees with a nephrologist’s prescription, it may still be suboptimal because hemodialysis is a dynamic intervention requiring continuous reassessment. In routine clinical practice, dialysis prescriptions are frequently modified in response to real-time changes in blood pressure, vasopressor requirements, intradialytic symptoms, ultrafiltration tolerance, and evolving biochemical parameters – factors that cannot be fully captured by static chart-based data [16–19]. Accordingly, agreement at the time of initial prescription should not be interpreted as equivalence in dynamic bedside management. Importantly, even a high level of agreement between AI-generated and nephrologist-made decisions should not be interpreted as evidence of clinical appropriateness in real-time dialysis management, nor as endorsement of autonomous AI-based prescribing without continuous physician oversight.

Within this predefined scope, the high agreement observed in dialysis initiation and modality selection is clinically meaningful. AKI in critically ill patients is associated with substantial morbidity and mortality, and delays in initiating RRT can worsen outcomes [4–6]. Randomized trials comparing accelerated versus delayed RRT initiation – including STARRT-AKI, AKIKI, and ELAIN – have demonstrated that timing is clinically consequential and that treatment pathways must be carefully evaluated within real-world constraints [8–10]. In this context, AI-assisted decision-support tools that improve structured and rapid triage may have pragmatic value, particularly in emergency and ICU settings where time-sensitive decisions are required and immediate access to nephrology expertise can be limited [12,13].

In our study, the most clinically relevant disagreement was observed in ultrafiltration-related decisions, which is expected given the physiology of volume removal during intermittent hemodialysis. The ultrafiltration strategy is highly individualized and closely linked to intradialytic hypotension risk and cardiovascular stress [17–19]. Intradialytic hypotension is common, variably defined, and associated with adverse cardiovascular outcomes in observational studies [17,18]. Moreover, dialysis adequacy and safety frameworks emphasize that prescription parameters must be interpreted within the patient’s hemodynamic context rather than applied as fixed, uniform targets [16]. These considerations highlight the inherent limitations of static-data AI outputs for ultrafiltration management.

Conclusions

Our findings support a conservative interpretation: AI-generated dialysis prescriptions should be regarded as decision-support suggestions rather than definitive directives. This interpretation aligns with the broader medical AI literature, which emphasizes that the safest and most effective applications of AI are those integrated into clinician-led workflows with human oversight, particularly in high-risk clinical environments [1–3,15]. In nephrology, the potential utility of AI may be greatest in settings with constrained specialist availability, where AI could help standardize early decision pathways while ensuring that final responsibility remains with clinicians [12,13].

Future research should prioritize prospective study designs that incorporate real-time physiologic and treatment-response data streams, including hemodynamic trends, ultrafiltration tolerance, and intradialytic events, which are central to dialysis safety [16–19]. Multicenter validation will also be essential to improve generalizability and to better define the clinical contexts in which AI support is beneficial versus potentially misleading. Overall, our data suggest that AI can approximate nephrologist reasoning for initial dialysis decisions in real-world emergency and ICU settings, but it cannot substitute for dynamic bedside assessment – particularly for ultrafiltration – where ongoing clinician judgment remains essential.

References

1. Yu KH, Beam AL, Kohane IS, Artificial intelligence in healthcare: Past, present and future: Nat Biomed Eng, 2018; 2(10); 719-31

2. Topol EJ, High-performance medicine: The convergence of human and artificial intelligence: Nat Med, 2019; 25(1); 44-56

3. Rajkomar A, Dean J, Kohane I, Machine learning in medicine: N Engl J Med, 2019; 380(14); 1347-58

4. Hoste EAJ, Bagshaw SM, Bellomo R, Epidemiology of acute kidney injury in critically ill patients: The multinational AKI-EPI study: Intensive Care Med, 2015; 41; 1411-23

5. Chertow GM, Burdick E, Honour M, Acute kidney injury, mortality, length of stay, and costs in hospitalized patients: J Am Soc Nephrol, 2005; 16(11); 3365-70

6. Uchino S, Kellum JA, Bellomo R, Acute renal failure in critically ill patients: A multinational, multicenter study: JAMA, 2005; 294(7); 813-18

7. Kidney Disease: Improving Global Outcomes (KDIGO) Acute Kidney Injury Work Group, KDIGO clinical practice guideline for acute kidney injury: Kidney Int Suppl, 2012; 2(1); 1-138

8. STARRT-AKI Investigators; Canadian Critical Care Trials Group, Timing of initiation of renal-replacement therapy in acute kidney injury: N Engl J Med, 2020; 383; 240-51

9. Gaudry S, Hajage D, Schortgen F, Initiation strategies for renal-replacement therapy in the intensive care unit: N Engl J Med, 2016; 375; 122-33

10. Zarbock A, Kellum JA, Schmidt C, Effect of early vs delayed initiation of renal replacement therapy on mortality in critically ill patients with acute kidney injury (ELAIN): JAMA, 2016; 315(20); 2190-99

11. Kellum JA, Lameire NKDIGO AKI Guideline Work Group, Diagnosis, evaluation, and management of acute kidney injury: Kidney Int, 2013; 83(5); 816-25

12. Liyanage T, Ninomiya T, Jha V, Worldwide access to treatment for end-stage kidney disease: A systematic review: Lancet, 2015; 385(9981); 1975-82

13. Osman MA, Alrukhaimi M, Ashuntantang GE, Global nephrology workforce: Gaps and opportunities toward a sustainable kidney care system: Kidney Int Suppl (2011), 2018; 8(2); 52-63

14. Tomašev N, Glorot X, Rae JW, A clinically applicable approach to continuous prediction of acute kidney injury: Nature, 2019; 572(7767); 116-19

15. Obermeyer Z, Emanuel EJ, Predicting the future – big data, machine learning, and clinical medicine: N Engl J Med, 2016; 375(13); 1216-19

16. National Kidney Foundation, KDOQI Clinical practice guideline for hemodialysis adequacy: 2015 update: Am J Kidney Dis, 2015; 66(5); 884-930

17. Assimon MM, Flythe JE, Definitions of intradialytic hypotension: Semin Dial, 2017; 30(6); 464-72

18. Stefánsson BV, Brunelli SM, Cabrera C, Intradialytic hypotension and risk of cardiovascular disease: Clin J Am Soc Nephrol, 2014; 9(12); 2124-32

19. Flythe JE, Curhan GC, Brunelli SM, Disentangling the ultrafiltration rate–mortality association: The respective roles of session length and weight gain: Clin J Am Soc Nephrol, 2013; 8(7); 1151-61

In Press

Clinical Research  

Institutional and Regional Variations in Access to Clinical Trials and Next-Generation Sequencing in Turkis...

Med Sci Monit In Press; DOI: 10.12659/MSM.951027  

Clinical Research  

Low-Intensity Blood Flow-Restricted Multi-Joint Exercise Improves Muscle Function in Patients With Patellof...

Med Sci Monit In Press; DOI: 10.12659/MSM.950516  

Review article  

Musculoskeletal Ultrasound and MRI in the Evaluation of Chemotherapy-Induced Peripheral Neuropathy: A Review

Med Sci Monit In Press; DOI: 10.12659/MSM.951283  

Clinical Research  

Sensory Processing, Dissociation, and Affective Symptoms in Misophonia: A Cross-Sectional Study of 35 Adults

Med Sci Monit In Press; DOI: 10.12659/MSM.950938  

Most Viewed Current Articles

17 Jan 2024 : Review article   10,187,196

Vaccination Guidelines for Pregnant Women: Addressing COVID-19 and the Omicron Variant

DOI :10.12659/MSM.942799

Med Sci Monit 2024; 30:e942799

0:00

13 Nov 2021 : Clinical Research   3,708,487

Acceptance of COVID-19 Vaccination and Its Associated Factors Among Cancer Patients Attending the Oncology ...

DOI :10.12659/MSM.932788

Med Sci Monit 2021; 27:e932788

0:00

14 Dec 2022 : Clinical Research   2,341,643

Prevalence and Variability of Allergen-Specific Immunoglobulin E in Patients with Elevated Tryptase Levels

DOI :10.12659/MSM.937990

Med Sci Monit 2022; 28:e937990

0:00

16 May 2023 : Clinical Research   706,524

Electrophysiological Testing for an Auditory Processing Disorder and Reading Performance in 54 School Stude...

DOI :10.12659/MSM.940387

Med Sci Monit 2023; 29:e940387

0:00

Your Privacy

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website, You can decise for yourself which categories you you want to deny or allow. Please note that based on your settings not all functionalities of the site are available. View our privacy policy.

Medical Science Monitor eISSN: 1643-3750
Medical Science Monitor eISSN: 1643-3750