Data Pre-Processing in Text Mining

Creative Commons License

Aksoy T., Çelik S., Gülseçen S.

in: Who Runs the World: Data, Sevinç Gülseçen,Emre Akadal,Sushil Kumar Sharma, Editor, Istanbul University Press, İstanbul, pp.123-144, 2020

  • Publication Type: Book Chapter / Chapter Research Book
  • Publication Date: 2020
  • Publisher: Istanbul University Press
  • City: İstanbul
  • Page Numbers: pp.123-144
  • Editors: Sevinç Gülseçen,Emre Akadal,Sushil Kumar Sharma, Editor
  • Istanbul University Affiliated: Yes


The fact that any kind of user has the ability to generate data with great ease at any time causes an increase in
the importance of data mining. Considering the reality that the vast majority of the available data is composed of
unstructured data and that the data in the text type is outnumbering, it proves the increasing interest in text mining
and the abundance of studies in this field. However, in order to be able to examine an unstructured data type like text,
which is quite different from machine language, it is necessary to make this data more structured and make the
machine work. At this point, the data pre-processing step, which covers a large part of the entire text mining process,
is of great importance. In this chapter, it is aimed to explain the text pre-processing phase on a basic level by
supporting this using visuals. In doing so, it is primarily planned to focus on text mining and to explain in detail the
characteristics of the data processed. In this context, it is aimed to explain the data pre-processing steps followed in
order to overcome these difficulties by examining the difficulties created by the data in question. As a result, this
chapter is a descriptive review of the data pre-processing phase in text mining, which covers some of the studies
previously conducted on this subject.