Data Pre-Processing in Text Mining

Aksoy, Tuğçe; Çelik, SERRA; Gülseçen, SEVİNÇ

Data Pre-Processing in Text Mining

Who Runs the World: Data, Sevinç Gülseçen,Emre Akadal,Sushil Kumar Sharma, Editör, Istanbul University Press, İstanbul, ss.123-144, 2020

Yayın Türü: Kitapta Bölüm / Araştırma Kitabı
Basım Tarihi: 2020
Yayınevi: Istanbul University Press
Basıldığı Şehir: İstanbul
Sayfa Sayıları: ss.123-144
Editörler: Sevinç Gülseçen,Emre Akadal,Sushil Kumar Sharma, Editör
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
İstanbul Üniversitesi Adresli: Evet

Özet

The fact that any kind of user has the ability to generate data with great ease at any time causes an increase in
the importance of data mining. Considering the reality that the vast majority of the available data is composed of
unstructured data and that the data in the text type is outnumbering, it proves the increasing interest in text mining
and the abundance of studies in this field. However, in order to be able to examine an unstructured data type like text,
which is quite different from machine language, it is necessary to make this data more structured and make the
machine work. At this point, the data pre-processing step, which covers a large part of the entire text mining process,
is of great importance. In this chapter, it is aimed to explain the text pre-processing phase on a basic level by
supporting this using visuals. In doing so, it is primarily planned to focus on text mining and to explain in detail the
characteristics of the data processed. In this context, it is aimed to explain the data pre-processing steps followed in
order to overcome these difficulties by examining the difficulties created by the data in question. As a result, this
chapter is a descriptive review of the data pre-processing phase in text mining, which covers some of the studies
previously conducted on this subject.