Assessing the Quality of AI-Generated Responses to Botulinum Toxin Applications in Bruxism Therapy

ÖZDAL ZİNCİR, ÖZGE; Cifci Ozkan, Esra; HATİPOĞLU, ŞİRİN

doi:10.1097/scs.0000000000012195

Assessing the Quality of AI-Generated Responses to Botulinum Toxin Applications in Bruxism Therapy

ÖZDAL ZİNCİR Ö., Cifci Ozkan E., HATİPOĞLU Ş.

JOURNAL OF CRANIOFACIAL SURGERY, cilt.37, sa.3/4, ss.658-662, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 37 Sayı: 3/4
Basım Tarihi: 2026
Doi Numarası: 10.1097/scs.0000000000012195
Dergi Adı: JOURNAL OF CRANIOFACIAL SURGERY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
Sayfa Sayıları: ss.658-662
İstanbul Üniversitesi Adresli: Hayır

Özet

This study aimed to evaluate the accuracy, reliability, and clarity of information provided by artificial intelligence (AI)-based language models regarding botulinum toxin applications for bruxism treatment. Eighty-five open-ended questions were developed under 7 specific domains by a team comprising an oral and maxillofacial surgeon and 2 orthodontists. These domains included indications, contraindications, procedures, complications, prognoses, advantages, and disadvantages of botulinum toxin use in bruxism treatment. The questions were input into 3 AI chatbots: ChatGPT-4.0, Google Gemini, and Microsoft Copilot. The generated responses were independently evaluated by the researchers through 5 predefined accuracy categories: "Objectively True," "Selected Facts," "Minimal Facts," "Incorrect," and "Misleading." Statistical analysis, including chi(2) and the Fisher exact tests, was conducted to assess differences in response accuracy among the AI models, with significance set at P<0.05. A statistically significant association was found between the AI model and response accuracy (P<0.001). ChatGPT-4.0 predominantly delivered answers classified as "Objectively True," surpassing Google Gemini and Microsoft Copilot. Across all domains, ChatGPT-4.0 maintained higher rates of accurate responses, whereas Google Gemini and Microsoft Copilot frequently provided answers categorized as "Selected Facts" or "Minimal Facts." While AI-based language models, particularly ChatGPT-4.0, may serve as a useful adjunct for preliminary patient education and professional reference regarding botulinum toxin use in bruxism, they cannot replace evidence-based clinical judgment. Practitioners should remain vigilant about the risk of misinformation and ensure validation of AI-generated content against established guidelines and authoritative sources before applying it in patient care.