01 April 2026 : Database Analysis
AI-Powered Clinical Decision Support in Dentistry: Comparative Evaluation of Large Language Models for Oral Medicine and Periodontal Diagnosis
Rayan Mohammedfarooq Meer ABCE 1, Abdullah Alqarni CDE 2, Basem Mohammed Akily BEF 3, Hattan Zaki DEF 3, Mostafa Ibrahim FayadDOI: 10.12659/MSM.951721
Med Sci Monit 2026; 32:e951721
Table 2 Descriptive statistics with 95% CIs and effect sizes.
| Evaluation criteria | ChatGPT | Gemini | Copilot | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean±SD | 95% CI | Cohen’s d | Mean±SD | 95% CI | Cohen’s d | Mean±SD | 95% CI | Cohen’s d | |
| Accuracy | 4.717 ±0.633 | [4.599, 4.835] | 0.447 | 4.127 ±1.149 | [3.902, 4.353] | 0.759 | 4.441 ±0.752 | [4.293, 4.589] | 0.743 |
| Time efficiency | 4.929 ±0.290 | [4.875, 4.983] | 0.244 | 4.304 ±0.993 | [4.109, 4.499] | 0.701 | 4.578 ±0.535 | [4.473, 4.683] | 0.788 |
| Ease of use | 4.920 ±0.272 | [4.870, 4.971] | 0.293 | 4.324 ±1.016 | [4.124, 4.523] | 0.666 | 4.559 ±0.590 | [4.443, 4.675] | 0.748 |
| Clarity of explanation | 4.832 ±0.399 | [4.758, 4.906] | 0.422 | 4.137 ±0.879 | [3.965, 4.310] | 0.981 | 4.255 ±0.817 | [4.094, 4.415] | 0.912 |
| Comprehensiveness | 4.832 ±0.480 | [4.742, 4.921] | 0.350 | 4.137 ±0.890 | [3.962, 4.312] | 0.969 | 4.118 ±0.871 | [3.947, 4.289] | 1.013 |
| Ability to answer questions | 4.920 ±0.357 | [4.854, 4.987] | 0.223 | 4.324 ±0.881 | [4.151, 4.496] | 0.768 | 4.500 ±0.741 | [4.354, 4.646] | 0.675 |
| Reliability | 4.779 ±0.495 | [4.686, 4.871] | 0.447 | 4.294 ±0.907 | [4.116, 4.472] | 0.778 | 4.549 ±0.639 | [4.423, 4.675] | 0.706 |
| Diagnostic range | 4.841 ±0.474 | [4.752, 4.929] | 0.336 | 4.225 ±0.943 | [4.040, 4.411] | 0.821 | 4.461 ±0.740 | [4.315, 4.606] | 0.728 |
| Composite score | 4.846 ±0.075 | [4.757, 4.931] | 0.782 | 4.234 ±0.888 | [4.308, 4.570] | 0.959 | 4.433 ±0.163 | [4.077, 4.391] | 0.709 |






