AI-Powered Clinical Decision Support in Dentistry: Comparative Evaluation of Large Language Models for Oral Medicine and Periodontal Diagnosis

01 April 2026 : Database Analysis

Rayan Mohammedfarooq Meer^{ABCE 1}, Abdullah Alqarni^{CDE 2}, Basem Mohammed Akily^{BEF 3}, Hattan Zaki^{DEF 3}, Mostafa Ibrahim Fayad

^{CDEF 4}, Mohammed Hosny H. AbdElaziz

^{DEF 4}, Mohamed Omar Elboraey^{ABCEF 1,5*}

DOI: 10.12659/MSM.951721

Med Sci Monit 2026; 32:e951721

Authors information Article notes Copyright and License information

Introduction Material and Methods Results Discussion Conclusions Data Availability References

Related articles Order reprints Share article Share by email

View HTML version

Table 2 Descriptive statistics with 95% CIs and effect sizes.

Evaluation criteria	ChatGPT			Gemini			Copilot
Evaluation criteria	Mean±SD	95% CI	Cohen’s d	Mean±SD	95% CI	Cohen’s d	Mean±SD	95% CI	Cohen’s d
Accuracy	4.717 ±0.633	[4.599, 4.835]	0.447	4.127 ±1.149	[3.902, 4.353]	0.759	4.441 ±0.752	[4.293, 4.589]	0.743
Time efficiency	4.929 ±0.290	[4.875, 4.983]	0.244	4.304 ±0.993	[4.109, 4.499]	0.701	4.578 ±0.535	[4.473, 4.683]	0.788
Ease of use	4.920 ±0.272	[4.870, 4.971]	0.293	4.324 ±1.016	[4.124, 4.523]	0.666	4.559 ±0.590	[4.443, 4.675]	0.748
Clarity of explanation	4.832 ±0.399	[4.758, 4.906]	0.422	4.137 ±0.879	[3.965, 4.310]	0.981	4.255 ±0.817	[4.094, 4.415]	0.912
Comprehensiveness	4.832 ±0.480	[4.742, 4.921]	0.350	4.137 ±0.890	[3.962, 4.312]	0.969	4.118 ±0.871	[3.947, 4.289]	1.013
Ability to answer questions	4.920 ±0.357	[4.854, 4.987]	0.223	4.324 ±0.881	[4.151, 4.496]	0.768	4.500 ±0.741	[4.354, 4.646]	0.675
Reliability	4.779 ±0.495	[4.686, 4.871]	0.447	4.294 ±0.907	[4.116, 4.472]	0.778	4.549 ±0.639	[4.423, 4.675]	0.706
Diagnostic range	4.841 ±0.474	[4.752, 4.929]	0.336	4.225 ±0.943	[4.040, 4.411]	0.821	4.461 ±0.740	[4.315, 4.606]	0.728
Composite score	4.846 ±0.075	[4.757, 4.931]	0.782	4.234 ±0.888	[4.308, 4.570]	0.959	4.433 ±0.163	[4.077, 4.391]	0.709

Back to the Article