Basics of Genetic Programming: An Overview in the Context of the Relationship Between Genetic Programming And Econometrics

Karakaş Geyik S., Satman M. H.

in: Ekonometride Güncel Yöntemler ve Uygulamalar, Nilgün Çil, Editor, IU Press, Bloomington (IN), USA , İstanbul, pp.177-205, 2021

  • Publication Type: Book Chapter / Chapter Vocational Book
  • Publication Date: 2021
  • Publisher: IU Press, Bloomington (IN), USA 
  • City: İstanbul
  • Page Numbers: pp.177-205
  • Editors: Nilgün Çil, Editor
  • Istanbul University Affiliated: Yes


This study provides a short introduction and an overview of the basics of genetic programming (GP) and its applications in the context of econometrics. In other words, the aim of this study is to gain a deep insight into the relationship between GP and econometrics. Accordingly, first, the basic definition and development of GP are introduced. After the literature review, GP applications in the areas of engineering, electronics, image processing, signal processing, business, economics, financial time series, medicine, biology, entertainment, and computer games are presented. In this part, we aim to highlight the results of GP applications that produce human-competitive results and new inventions, followed by working principles of GP (the basic representation schemes, initialization, and operators used in GP). Then, we discuss how econometric problems can be handled as a GP problem. We showed that GP can successfully solve several econometric problems by producing new inventions and human-competitive results.

GP is a family of searching tools in which the basic aim is to generate mathematical expressions or computer programs between the inputs fed and output. In other words, GP explores the rules in terms of computer programs using given input and output data via evolutionary optimization. GP starts with a population of randomly chosen mathematical expressions or computer programs. The quality of an expression is measured by the fitness or cost, which indicates the difference between the generated and real outputs. Then, the best solutions are recombined to generate offspring. An offspring is a cut and paste copy of the two best candidate solutions. A good offspring is expected by recombining two fittest parents. After generating new candidates, mutations are applied on the constants or functions with small probabilities. Mutations result in small and significant changes depending on the location of the mutated elements in expressions. Small changes perform a local search, whereas significant changes are essential for searching other sides of the search space in terms of optimization. After several steps, generated expressions form a new population. The search process terminates after the meeting the stopping criteria.

GP develops computer programs by optimization, whereas genetic algorithms are generally used for problems in which the main objective is to find a global extremum of a function. Genetic algorithms encode decision variables using binary, integer, permutation, or floatingpoint encoding–decoding schemes, whereas GP generally uses abstract syntax trees, postfix and prefix representations, Java bytecodes, or assembly language opcodes. Because GP develops mathematical expressions and programs, GP tasks are computer-aided, and an interpreter is always needed to evaluate generated offspring for calculating fitness values in runtime. 

In econometrics; the tasks of estimating model parameters, testing the significanceofparameters, and performing forecasts are nontrivial. In numerous cases, researchers face difficulties in selecting the best model, functional structure, correct combination of lagged values or independent variables, etc. The problems of model and variable selections are handled separately or together in the literature. When the number of independent variables in variable-pool increases, selecting the k correct variables among K variables results in a combinatorial problem, where k ≤ K. Furthermore, when the functional form is undefined by theory, nothing can be done, except trying all possible forms with unknown variables. This is a “searching the needle in the haystack” type of problem. GP is broadly used in such problems under the title of symbolic regression. In general terms, the main question of symbolic regression is: “what is the correct functional form with correct independent variables that explains a dependent variable well?”. Besides the estimation process and techniques used, numerous model suggestions can satisfy a model selection criteria used in the classic econometric literature.

Econometric models are generally represented by mathematical expressions. In programming languages, mathematical expressions are also computer programs. For example, piecewise functions are basically represented using if-then-else statements or expressions in several programming languages. Similarly, piecewise regressions, nonlinear time-series models, as well as buy and hold strategies in finance are combinations of mathematical expressions including decisions as computer programs. Developing the best strategy and determining the best time for buying a stock is a well-studied topic in the GP literature.

GP is neither the end of the whole story nor the magic that solves problems. First, GP tends to generate more than sufficient expressions as solutions; several researchers have tried deciphering this phenomenon. Second, an automatic process can almost always find an overfitted set of solutions describing the relations between variables. As in other econometric studies, researchers should be attentive to select interpretable solutions to the concerned problem. In other words, the model that minimizes selection criteria may not be the proper model. Moreover, GP is successful in model and variable selections, rule detection, discovering new location-comparison tests, and inventing patentable things in a human-competitive way.