EUROPEAN JOURNAL OF FOREST RESEARCH, cilt.143, sa.3, 2024 (SCI-Expanded)
The phenotype of a woody plant represents its unique morphological
properties. Population discrimination and individual classification are
crucial for breeding populations and conserving genetic diversity.
Machine Learning (ML) algorithms are gaining traction as powerful tools
for predicting phenotypes. The present study is focused on classifying
and clustering the seeds and seedlings in terms of morphological
characteristics using ML algorithms. In addition, the k-means algorithm
is used to determine the ideal number of clusters. The results obtained
from the k-means algorithm were then compared with reality. The best
classification performance achieved by the Random Forest algorithm was
an accuracy of 0.648 and an F1-Score of 0.658 for the seed traits. Also,
the best classification performance for stone pine seedlings was
observed for the k-Nearest Neighbors algorithm (k = 18), for which the
accuracy and F1-Score were 0.571 and 0.582, respectively. The best
clustering performance was achieved with k = 2 for the seed (average
Silhouette index = 0.48) and seedling (average Silhouette Index = 0.51)
traits. According to the principal component analysis, two dimensions
accounted for 97% and 63% of the traits of seeds and seedlings,
respectively. The most important features between the seed and seedling
traits were cone weight and bud set, respectively. This study will
provide a foundation and motivation for future efforts in forest
management practices, particularly regarding reforestation, yield
optimization, and breeding programs.