p-Adic Distance Functions and Comparative Performance Analysis of Distance Functions with the k-NN Classifier

KARTAL, ELİF; ÇALIŞKAN, FATMA; ESKİŞEHİRLİ, BEYAZ; ÖZEN, ZEKİ

doi:10.1007/s00357-026-09543-8

p-Adic Distance Functions and Comparative Performance Analysis of Distance Functions with the k-NN Classifier

KARTAL E., ÇALIŞKAN F., ESKİŞEHİRLİ B. B., ÖZEN Z.

JOURNAL OF CLASSIFICATION, 2026 (SCI-Expanded, SSCI, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1007/s00357-026-09543-8
Dergi Adı: JOURNAL OF CLASSIFICATION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, BIOSIS, INSPEC, Library, Information Science & Technology Abstracts (LISTA), MathSciNet, zbMATH, DIALNET
İstanbul Üniversitesi Adresli: Evet

Özet

In the literature, most distance functions used for distance-based machine learning algorithms are based on the usual absolute value function on the field Q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {Q}$$\end{document} of rational numbers. On the other hand, Ostrowski's theorem states that every non-trivial absolute value on Q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {Q}$$\end{document} is equivalent to either the usual absolute value or a p-adic absolute value for some prime p. In this study, a new p-adic distance function, the p-adic Euclidean distance, is defined based on the p-adic absolute value. This paper represents the first systematic investigation of both the p-adic Euclidean distance and the p-adic Chebyshev distance (referred to in the literature as the p-adic max-norm) within the k-nearest neighbor (k-NN) framework. Together with the previously proposed p-adic Manhattan distance, these distance functions are employed in k-NN models and evaluated on 30 publicly available datasets. Their performance is compared with that of k-NN models using 14 conventional distance functions commonly found in the literature. In the analyses, binary and multi-class classification were performed with datasets containing categorical, numerical, and mixed types of predictive attributes. To find the best performance values of the models, the number of neighbors k was examined for numbers varied from 2 to 20, and the prime number parameter p of p-adic distance was tested for numbers less than 29. The performance of the 17 distance functions used in the models was evaluated in terms of accuracy, recall, precision, and F1-score. In 13 of the 30 datasets, the models with p-adic distances were among the top five performers. When the datasets were analyzed separately as categorical, numerical, and mixed types, models employing p-adic distances achieved the highest classification accuracy in 14 out of 30 datasets. In numerical and mixed datasets, it was observed that the precision of decimal values influenced the performance of the analysis. These results suggest that the use of p-adic distances in the k-NN algorithm is particularly effective in categorical and mixed data scenarios, often surpassing many commonly used distance functions in the literature.

<i>p</i>-Adic Distance Functions and Comparative Performance Analysis of Distance Functions with the <i>k</i>-NN Classifier

Özet