Author attribution on streaming data


Seker Ş. E., Al-Naami K., Khan L.

2013 IEEE 14th International Conference on Information Reuse and Integration, IEEE IRI 2013, San Francisco, CA, Amerika Birleşik Devletleri, 14 - 16 Ağustos 2013, ss.497-503 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iri.2013.6642511
  • Basıldığı Şehir: San Francisco, CA
  • Basıldığı Ülke: Amerika Birleşik Devletleri
  • Sayfa Sayıları: ss.497-503
  • Anahtar Kelimeler: author recognition, authorship attribution, big data, data mining, natural language processing, POS Tagging, text mining
  • İstanbul Üniversitesi Adresli: Hayır

Özet

The concept of novel authors occurring in streaming data source, such as evolving social media, is an unaddressed problem up until now. Existing author attribution techniques deals with the datasets, where the total number of authors do not change in the training or the testing time of the classifiers. This study focuses on the question, 'what happens if new authors are added into the system by time?'. Moreover in this study we are also dealing with the problems that some of the authors may not stay and may disappear by time or may reappear after a while. In this study stream mining approaches are proposed to solve the problem. The test scenarios are created over the existing IMDB62 data set, which is widely used by author attribution algorithms already. We used our own shuffling algorithms to create the effect of novel authors. Also before the stream mining, POS tagging approaches and the TF-IDF methods are applied for the feature extraction. And we have applied bi-tag approach where two consecutive tags are considered as a new feature in our approach. By the help of novel techniques, first time proposed in this paper, the success rate has been increased from 35% to 61% for the authorship attribution on streaming text data. © 2013 IEEE.