Evaluating a subset of ancestry informative SNPs for discriminating among Southwest Asian and circum-Mediterranean populations

Bulbul O., CHERNI L., KHODJET-EL-KHIL H., Rajeevan H., Kidd K. K.

FORENSIC SCIENCE INTERNATIONAL-GENETICS, vol.23, pp.153-158, 2016 (SCI-Expanded) identifier identifier identifier


Many different published sets of single nucleotide polymorphisms (SNPs) and/or insertion-deletion polymorphisms (InDels) can serve as ancestry informative markers (AIMs) to distinguish among continental regions of the world. For a focus on Southwest Asian ancestry we chose to start with the Kidd Lab panel of 55 ancestry-informative SNPs (AISNPs) because it already provided good global reference data (FROG-kb: frog.med.yale.edu) in a set of 73 population samples distinguishing at least 8 biogeographic clusters of populations. This panel serves as a good first tier ancestry panel. We are now interested in identifying region-specific second tier panels for more refined distinction among populations within each of the global regions. We have begun studying the global region centered on Southwest Asia and the region encompassing the Mediterranean Sea. We have incorporated 10 populations from North Africa, Turkey and Iran and included 31 of the original 73 populations and eleven 1000 Genomes Phase3 populations for a total of 3129 individuals from 52 populations, all typed for the 55 AISNPs. We have then identified the subset of the 55 AISNPs that are most informative for this region of the world using Heatmap, Fst, and Informativeness analyses to eliminate those SNPs essentially redundant or providing no information among populations in this region, reducing the number of SNPs to 32. STRUCTURE and PCA analyses show the remaining 32 SNPs identify the North African cluster and appropriately include the Turkish and Iranian samples with the Southwest Asian cluster. These markers provide the basis for building an improved, optimized panel of AISNPs that provides additional information on differences among populations in this part of the world. The data have also allowed an examination of the accuracy of the ancestry inference based on 32 SNPs for the newly studied populations from this region. The likelihood ratio approach to ancestry inference embodied in FROG-kb provides highly significant population assignments within one order of magnitude for each individual in the Turkish, Iranian, and Tunisian populations. (C) 2016 The Authors. Published by Elsevier Ireland Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).