Impact Analysis of COVID-19 Pandemic on Istanbul Traffic with Big Data Tools

Alcan U., KAÇAR F.

ELECTRICA, vol.22, no.2, pp.226-236, 2022 (ESCI) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 22 Issue: 2
  • Publication Date: 2022
  • Doi Number: 10.54614/electrica.2022.210005
  • Journal Name: ELECTRICA
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus, TR DİZİN (ULAKBİM)
  • Page Numbers: pp.226-236
  • Keywords: Apache spark, big data, elasticsearch, kibana, traffic analysis
  • Istanbul University Affiliated: No


With the internet brought along by technology, people have started to produce data in almost all their jobs. We create a huge data source with many activities we cannot count, such as sending messages on Whatsapp, sharing photos on Instagram, searching in Google, and sending electronic mails (email) and this process is repeated every single day. Such dense and different data also lead to information garbage. Analyzing this dump with traditional technologies has been another problem. Big companies that are interested to analyze this mass information, analyze the behavior of their customers, and determine their strategies according to the results obtained have come up with the concept of big data. Big data are the form of the data we obtain from different sources such as social media shares, sensor data, photo archives, call records obtained from Global System for Mobile Communications (GSM) operators, and search engine statistics, into a meaningful and processable form [1]. In this study, the effect of the coronavirus disease 2019 pandemic, which is an important problem of today, on Istanbul traffic has been examined by using the power of big data technologies. In this context, the hourly traffic index of the 2020 dataset which has openly been published by Istanbul Metropolitan Municipality [2], and the curfew time dataset is discussed. Apache Spark, a new generation data processing tool, has been used in the analysis of these datasets. With Apache Spark, first, general analysis of the Istanbul traffic index data for 2020 has been carried out, and then, the data obtained have been checked whether it is associated with the curfew time dataset and impact analysis has been performed. Elasticsearch has been utilized to keep the processed data, and Kibana has been used for data visualization. At the end of the study, machine learning applications on traffic density have been enhanced using Apache Spark's machine learning library, Application Programming Interface (API) with logistic regression, decision trees, random forest, gradient-boosted tree-based OneVsRest, and linear support vector machine-based OneVsRest methods.