Spectral Clustering with Reverse Soft K-Nearest Neighbor Density Estimation

Kursun O.

World Congress on Computational Intelligence (WCCI 2010), Barcelona, Spain, 01 January 2010 identifier identifier


Spectral Clustering (sq is a kernel method to cluster data objects using eigenvectors derived from the data. One fundamental issue that cause poor cuts in SC is its sensitivity to outliers. Another fundamental problem is how to determine the kernel bandwidth from the data. In fact, these two problems are also closely related. One cannot be solved before solving the other. The answer lies in robust and nonparametric estimators of the data density. We propose Reverse Soft K-Nearest Neighbor Density Estimation (RSKNN) that determines the density around a data sample, thus this sample's potential (other used terms are weight or entropy), using all the other samples' nearest neighbors' scatter properties on the contrary to the common practice of using the nearest neighbors of a sample itself to determine its own density. The basic idea behind this can be summarized as "every sample have k neighbors that are the nearest but not every point can be in many points' k-nearest neighborhood". To demonstrate the use of it, we apply it to SC. Our package to use in SC consists of using RSKNN for estimating the density and the samples with high potential to be cluster centers helps in: 1) spectral decomposition phase to improve the generalization of the spectral cut criterion, 2) robust calculation of covariance matrix to be used in distance calculations, and 3) automatically determining the kernel bandwidth.