Hybridized model selection with Gifi system for categorical data using the genetic algorithm and information complexity


Karaman E., Arıcıgil Çilan Ç., Bozdogan H.

ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS, cilt.57, ss.1-12, 2023 (SCI-Expanded)

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 57
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.elerap.2022.101221
  • Dergi Adı: ELECTRONIC COMMERCE RESEARCH AND APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus, ABI/INFORM, Business Source Elite, Business Source Premier, Compendex, INSPEC
  • Sayfa Sayıları: ss.1-12
  • İstanbul Üniversitesi Adresli: Evet

Özet

In the cross-disciplinary fields of social and behavioral sciences, biology, e-commerce, econometrics, medical data mining, and in engineering applications the available data are mostly composed of many categorical, continuous, and mixed data types with both categorical and continuous variables. Modeling such data structures creates many challenges and difficulties in terms of the underlying probability distributional assumptions to model. This paper proposes a novel categorical regression (CATREG) model using optimal scaling technique in Gifi system to resolve the current existing problem by transforming the categorical data to a continuous data and then performing the analysis of the data in the new transformed Gifi space. Such transformation preserves the scaling properties of the original variables without loss of any information and mapping is one-to-one and onto, unlike the kernel mapping in feature space in machine learning. We introduce a hybridized model selection via the information complexity (ICOMP) criterion along with the genetic algorithm (GA) in CATREG model and provide interpretable results. Two real numerical examples are provided to study the effects of the cell phone usage on the sleep patterns of individuals, and a second example is based on building a predictive model of e-commerce for new car market. In both of these numerical examples subset selection of the best predictor variables are determined to build an optimal predictive model. Our results show the efficiency and the versatility of the proposed new approach.