Recurring and novel class detection using class-based ensemble for evolving data stream

Al-Khateeb, Tahseen; Masud, Mohammad; Al-Naami, Khaled; Seker, Şadi; Mustafa, Ahmad; Khan, Latifur; Trabelsi, Zouheir; Aggarwal, Charu; Han, Jiawei

doi:10.1109/tkde.2015.2507123

Recurring and novel class detection using class-based ensemble for evolving data stream

Al-Khateeb T., Masud M. M., Al-Naami K. M., Seker Ş. E., Mustafa A. M., Khan L., ...Daha Fazla

IEEE Transactions on Knowledge and Data Engineering, cilt.28, sa.10, ss.2752-2764, 2016 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 28 Sayı: 10
Basım Tarihi: 2016
Doi Numarası: 10.1109/tkde.2015.2507123
Dergi Adı: IEEE Transactions on Knowledge and Data Engineering
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.2752-2764
Anahtar Kelimeler: and association rules, classification, clustering, data mining, Database applications
İstanbul Üniversitesi Adresli: Hayır

Özet

Streaming data is one of the attention receiving sources for concept-evolution studies. When a new class occurs in the data stream it can be considered as a new concept and so the concept-evolution. One attractive problem occurring in the concept-evolution studies is the recurring classes from our previous study. In data streams, a class can disappear and reappear after a while. Existing studies on data stream classification techniques either misclassify the recurring class or falsely identify the recurring classes as novel classes. Because of the misclassification or false novel classification, the error rates increases on those studies. In this paper we address the problem by defining a novel ensemble technique 'class-based' ensemble which replaces the traditional 'chunk-based' approach in order to detect the recurring classes. We discuss the details of two different approaches in class-based ensemble and explain and compare them in detail. Different than the previous studies in the field, we also prove the superiority of both 'class-based' ensemble method over state-of-art techniques via empirical approach on a number of benchmark data sets including web comments as text mining challenge.