2013 IEEE 14th International Conference on Information Reuse and Integration, IEEE IRI 2013, San Francisco, CA, Amerika Birleşik Devletleri, 14 - 16 Ağustos 2013, ss.497-503
The concept of novel authors occurring in streaming data source, such as evolving social media, is an unaddressed problem up until now. Existing author attribution techniques deals with the datasets, where the total number of authors do not change in the training or the testing time of the classifiers. This study focuses on the question, 'what happens if new authors are added into the system by time?'. Moreover in this study we are also dealing with the problems that some of the authors may not stay and may disappear by time or may reappear after a while. In this study stream mining approaches are proposed to solve the problem. The test scenarios are created over the existing IMDB62 data set, which is widely used by author attribution algorithms already. We used our own shuffling algorithms to create the effect of novel authors. Also before the stream mining, POS tagging approaches and the TF-IDF methods are applied for the feature extraction. And we have applied bi-tag approach where two consecutive tags are considered as a new feature in our approach. By the help of novel techniques, first time proposed in this paper, the success rate has been increased from 35% to 61% for the authorship attribution on streaming text data. © 2013 IEEE.