An efficient algorithm based on cluster analysis for exploring structure of large multivariate datasets


CEVRİ M.

Computers in Biology and Medicine, cilt.197, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 197
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1016/j.compbiomed.2025.111016
  • Dergi Adı: Computers in Biology and Medicine
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, BIOSIS, Biotechnology Research Abstracts, CINAHL, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, Library, Information Science & Technology Abstracts (LISTA)
  • Anahtar Kelimeler: Cluster analysis, Clustering validation measures, Factor analysis, K-means, Silhouette coefficient, Teucrium
  • İstanbul Üniversitesi Adresli: Evet

Özet

This study provides the cluster and factor analyses of 40 species of Teucrium, which is a plant and belongs to the Lamiaceae family with more than 260 species distributing all around the world under their 21 micromorphology characters are performed. In establishing the optimal a number of clusters in the K-means algorithm, we use a popular clustering index called silhouette index. By combining factor analysis with the clustering methods based on the silhouette validation measure, we have developed an efficient algorithm for clustering large multivariate datasets using Mathematica software, which is well-known for its algebraic manipulation capabilities. Furthermore, the suggested methodology is assessed in comparison with the most commonly used and popular clustering techniques and approaches. Computer simulation results show that it plays a central role for classification of the species of Teucrium according to their micromorphological characters so that they help to obtain some compounds that are useful in pharmaceutical manufacturing or in medicine which discovers, develops, produces, and markets pharmaceutical drugs. Moreover, the silhouette coefficient method offers a means of evaluating the validity of clustering for large datasets, and is regarded as a more effective and accurate approach than other methods for determining the optimal number of clusters.