A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

Zeidi, Farnaz; Azar, Lalah; Arslan, Vasfiye; EROL, Çiğdem

doi:10.1080/01969722.2022.2080338

A Hybrid Model Focusing on Data Pre-Processing in Diabetes Diagnosis

Atıf İçin Kopyala

Zeidi F., Azar L., Arslan V., EROL Ç.

CYBERNETICS AND SYSTEMS, cilt.54, sa.7, ss.1199-1211, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 54 Sayı: 7
Basım Tarihi: 2023
Doi Numarası: 10.1080/01969722.2022.2080338
Dergi Adı: CYBERNETICS AND SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, BIOSIS, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
Sayfa Sayıları: ss.1199-1211
Anahtar Kelimeler: Classification algorithms, diabetes diagnosis, hybrid model, K-means algorithm, normalization, outliers detection
İstanbul Üniversitesi Adresli: Evet

Özet

Diabetes mellitus is a common and serious disease that has been studied by many researchers. Pima Indians Diabetes Dataset is one of the most famous datasets in this field. This study aims to increase the accuracy of machine learning algorithms in diagnosing the disease and to reveal the patterns that enable early diagnosis of the disease by focusing on the pre-processing stages. The proposed hybrid model includes "filling in missing values with KNN", "examining six different normalization methods for normalization" and "removing outliers with K-means" in the pre-processing stage. In the data classification stage, four algorithms C4.5, SVM, Naive Bayes and KNN were examined and the best hybrid model was found. The performance evaluation of these models is based on accuracy. The results were compared with previous studies and had higher accuracy of 98.3% and 99.1% for (KNN + n5 + K-means + SVM) and (KNN + n4/n3 + K-means + KNN), respectively. Finally, we offer the conclusive notes and some suggestions for further study.