Do Large Language Models Perform Equally Across Languages? A Comparison of Responses to Frequently Asked Questions in Anesthesiology

Hadi Ufuk Yörükoğlu; Can Aksu; Pervez Sultan; Serkan Tulgar

doi:10.12659/MSM.951815

28 February 2026 : Clinical Research

Do Large Language Models Perform Equally Across Languages? A Comparison of Responses to Frequently Asked Questions in Anesthesiology

Hadi Ufuk Yörükoğlu^{ABCEF 1*}, Can Aksu^{AE 2}, Pervez Sultan^{AE 3}, Serkan Tulgar^{ABEF 4}

DOI: 10.12659/MSM.951815

Med Sci Monit 2026; 32:e951815

Authors information Article notes Copyright and License information

Introduction Material and Methods Results Discussion Conclusions References

Related articles Order reprints Share article Share by email

View HTML version

Table 7 Proportion of responses rated as acceptable and satisfactory (Likert scale 4–5) across the large language models and evaluation metrics.

	ChatGPTEnglish	ChatGPTTurkish	DeepSeekEnglish	DeepSeekTurkish
Content quality	78.8%	64.2%	70.3%	64.2%
Communication quality	77.0%	67.0%	70.9%	62.4%
Overall quality	77.8%	65.6%	70.6%	63.3%

Back to the Article

Do Large Language Models Perform Equally Across Languages? A Comparison of Responses to Frequently Asked Questions in Anesthesiology

Your Privacy