IRSYSC 2024 VIII. INTERNATIONAL RESEARCHERS, STATISTICIANS, AND YOUNG STATISTICIANS CONGRESS NOVEMBER 28-30, 2024, Adana, Türkiye, 28 - 30 Kasım 2024, ss.127, (Özet Bildiri)
Synthetic data is data generated by artificial intelligence to reduce the cost of research and save
time when real data is not available. Synthetic data includes not only entered numerical data
and codes, but also texts, images, audio, and video recordings. When the studies conducted with
synthetic data in the literature are examined; it is seen that synthetic data is mainly used in the
fields of tourism, health, finance, automotive, and market research. Generative artificial
intelligence is utilized in the production of synthetic data. Machine learning techniques such as
Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Large Language
Model (LLM), Retrieval-Augmented Generation (RAG), Reinforcement Learning for Human
Feedback (RLHF), Agent-Based Modeling (ABM) are generally used to generate synthetic
data. In quantitative and qualitative research, the use of synthetic data has become widespread
in cases where sufficient data cannot be collected, and confidentiality of the study is a priority.
Synthetic data, which can be used in qualitative research due to the problem of finding
participants and the high costs of face-to-face interviews, can simulate real-world scenarios,
interviews, and observations with the help of artificial intelligence. Again, large language
models can be used to create a synthetic data set that can mimic real data. In quantitative
research, synthetic data is used to assign missing observations with artificial intelligence
systems. In this study, studies on synthetic data used in quantitative and qualitative research are
reviewed.
Key Words: Synthetic Data, Quantitative- Qualitative Research, Generative AI