Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling

Tuna, Süha; Gulec, Çağrı; YÜCESAN, EMRAH; ÇIRAKOĞLU, AYŞE; Arguden, Yelda

doi:10.1109/tcbb.2023.3292245

Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling

Tuna S., Gulec C., YÜCESAN E., ÇIRAKOĞLU A., Arguden Y. T.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, cilt.20, sa.5, ss.2959-2969, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 20 Sayı: 5
Basım Tarihi: 2023
Doi Numarası: 10.1109/tcbb.2023.3292245
Dergi Adı: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, Compendex, EMBASE, INSPEC, MEDLINE, Metadex, Civil Engineering Abstracts
Sayfa Sayıları: ss.2959-2969
Anahtar Kelimeler: chaos game representation, enhanced multivariance products representation, Gene network analysis, high dimensional modelling, support vector machines
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
İstanbul Üniversitesi Adresli: Evet

Özet

In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together, which we name “Computational Gene Network Analysis” (CoGNA).To evaluate our method, we selected two gene networks, mTOR and TGF-$\beta$. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-$\beta$ pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than $96\%$ and $99\%$ classification accuracies for mTOR and TGF-$\beta$ networks, respectively, using a limited amount of training samples.