Combining multiple clusterings for protein structure prediction


Sakar C. O., Kursun O., Seker H., Gurgen F.

INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, cilt.10, sa.2, ss.162-174, 2014 (SCI-Expanded) identifier identifier identifier

Özet

Computational annotation and prediction of protein structure is very important in the post-genome era due to existence of many different proteins, most of which are yet to be verified. Mutual information based feature selection methods can be used in selecting such minimal yet predictive subsets of features. However, as protein features are organised into natural partitions, individual feature selection that ignores the presence of these views, dismantles them, and treats their variables intermixed along with those of others at best results in a complex un-interpretable predictive system for such multi-view datasets. In this paper, instead of selecting a subset of individual features, each feature subset is passed through a clustering step so that it is represented in discrete form using the cluster indices; this makes mutual information based methods applicable to view-selection. We present our experimental results on a multi-view protein dataset that are used to predict protein structure.