COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, vol.41, no.5, pp.644-652, 2012 (SCI-Expanded)
The authors introduce an algorithm for estimating the least trimmed squares (LTS) parameters in large data sets. The algorithm performs a genetic algorithm search to form a basic subset that is unlikely to contain outliers. Rousseeuw and van Driessen (2006) suggested drawing independent basic subsets and iterating C-steps many times to minimize LTS criterion. The authors 'algorithm constructs a genetic algorithm to form a basic subset and iterates C-steps to calculate the cost value of the LTS criterion. Genetic algorithms are successful methods for optimizing nonlinear objective functions but they are slower in many cases. The genetic algorithm configuration in the algorithm can be kept simple because a small number of observations are searched from the data. An R package is prepared to perform Monte Carlo simulations on the algorithm. Simulation results show that the performance of the algorithm is suitable for even large data sets because a small number of trials is always performed.