Topic Modelling in Archival and Records Management: An Analysis Based on Latent Dirichlet Allocation


Güler C., Keskin İ., Sümbül S.

Current Issues in Archival Science, İshak Keskin,Ceyhan Güler,Sinan Sümbül, Editör, Istanbul University Press, İstanbul, ss.90-104, 2025

  • Yayın Türü: Kitapta Bölüm / Mesleki Kitap
  • Basım Tarihi: 2025
  • Yayınevi: Istanbul University Press
  • Basıldığı Şehir: İstanbul
  • Sayfa Sayıları: ss.90-104
  • Editörler: İshak Keskin,Ceyhan Güler,Sinan Sümbül, Editör
  • İstanbul Üniversitesi Adresli: Evet

Özet

Probabilistic subject models, which are expressed as a group of algorithms and are revealed by transforming the hidden thematic information in the relevant and related documents or resources in a certain field into a small-sized universe, are among the research topics that are used in the fields of machine learning and text mining and whose importance continues to increase. Machine learning and text mining, which are also used in the field of archival and records management, provide benefits for the field, especially in terms of identifying topics in large volumes of resources and visibility to missing or unstudied topics. Within the scope of this study, applying Latent Dirichlet Allocation (LDA), which is a subject modelling method that performs automatic and semantic analysis of master's and doctoral theses on archival and records management prepared in Information and Records Management Departments (BBY) in Türkiye. In the LDA application, the experimental study on the extracts of 243 theses prepared and obtained between 1987-2022 on archival and records management issues included in the National Thesis Center Database of the Council of Higher Education and in the thesis catalogues of the universities where BBY departments are located, was carried out with the Gibbs sampling algorithm. As a result of the study, 20 main topics were determined. Namely "corporate electronic records management systems", "electronic records management", "television archives management", "Ottoman diplomacy", "ERMS and access services", "management of visual archives", "artificial intelligence and access", "digital cultural heritage management”, “web accessibility of cultural heritage”, “corporate electronic records management systems”, “electronic archive management”, “digitalization of audiovisual archive heritage”, “private archives and access”, “digital cultural heritage and access”, “electronic archives and research services”, “information and records management training”, “user services in photo archives”, “corporate ERMS models”, “information and records management” and “archival ethics and access”. When the results were evaluated, it was seen that important topics in archival and records management were successfully discovered with the algorithm in question.