28 February 2026 : Clinical Research
Do Large Language Models Perform Equally Across Languages? A Comparison of Responses to Frequently Asked Questions in Anesthesiology
Hadi Ufuk Yörükoğlu ABCEF 1*, Can Aksu AE 2, Pervez Sultan AE 3, Serkan Tulgar ABEF 4DOI: 10.12659/MSM.951815
Med Sci Monit 2026; 32:e951815
Table 7 Proportion of responses rated as acceptable and satisfactory (Likert scale 4–5) across the large language models and evaluation metrics.
| ChatGPTEnglish | ChatGPTTurkish | DeepSeekEnglish | DeepSeekTurkish | |
|---|---|---|---|---|
| Content quality | 78.8% | 64.2% | 70.3% | 64.2% |
| Communication quality | 77.0% | 67.0% | 70.9% | 62.4% |
| Overall quality | 77.8% | 65.6% | 70.6% | 63.3% |






