01 April 2026 : Database Analysis
AI-Powered Clinical Decision Support in Dentistry: Comparative Evaluation of Large Language Models for Oral Medicine and Periodontal Diagnosis
Rayan Mohammedfarooq Meer ABCE 1, Abdullah Alqarni CDE 2, Basem Mohammed Akily BEF 3, Hattan Zaki DEF 3, Mostafa Ibrahim FayadDOI: 10.12659/MSM.951721
Med Sci Monit 2026; 32:e951721
Table 4 Significant pairwise comparisons (Wilcoxon signed-rank test) (P<0.05).
| Evaluation criteria | ChatGPT vs Gemini | ChatGPT vs Copilot | Gemini vs Copilot |
|---|---|---|---|
| Accuracy | <0.001 (r=2.348)*** | =0.004 (r=3.510)** | =0.013 (r=3.647)* |
| Time efficiency | <0.001 (r=0.152)*** | <0.001 (r=1.275)*** | =0.014 (r=4.637)* |
| Ease of use | <0.001 (r=0.784)*** | v<0.001 (r=0.500)*** | =0.045 (r=4.314)* |
| Clarity of explanation | <0.001 (r=1.407)*** | <0.001 (r=1.765)*** | =0.251 ns (r=6.029) |
| Comprehensiveness | <0.001 (r=0.760)*** | <0.001 (r=1.422)*** | =0.810 ns (r=6.015) |
| Ability to answer questions | <0.001 (r=0.588)*** | <0.001 (r=0.676)*** | =0.067 ns (r=3.922) |
| Reliability | <0.001 (r=1.132)*** | =0.005 (r=2.667)** | =0.012 (r=2.985)* |
| Diagnostic range | <0.001 (r=1.098)*** | <0.001 (r=1.515)*** | =0.046 (r=5.745)* |
| Significance levels: *** ** * ns – not significant. Effect size (r): small=0.1, medium=0.3, large=0.5. | |||






