Gene Teams are on the Field: Evaluation of Variants in Gene-Networks Using High Dimensional Modelling


Creative Commons License

Tuna S., Gulec C., YÜCESAN E., ÇIRAKOĞLU A., Arguden Y. T.

IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.20, no.5, pp.2959-2969, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 20 Issue: 5
  • Publication Date: 2023
  • Doi Number: 10.1109/tcbb.2023.3292245
  • Journal Name: IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, BIOSIS, Biotechnology Research Abstracts, Communication Abstracts, Compendex, EMBASE, INSPEC, MEDLINE, Metadex, Civil Engineering Abstracts
  • Page Numbers: pp.2959-2969
  • Keywords: chaos game representation, enhanced multivariance products representation, Gene network analysis, high dimensional modelling, support vector machines
  • Istanbul University Affiliated: Yes

Abstract

In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together, which we name “Computational Gene Network Analysis” (CoGNA).To evaluate our method, we selected two gene networks, mTOR and TGF-$\beta$. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-$\beta$ pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than $96\%$ and $99\%$ classification accuracies for mTOR and TGF-$\beta$ networks, respectively, using a limited amount of training samples.