XhetRel: a pipeline for X heterozygosity and relatedness analysis of sequencing data


Salman B., Bebek N., Uğur İşeri S.

Bioinformatics Advances, vol.6, no.1, 2026 (ESCI, Scopus) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 6 Issue: 1
  • Publication Date: 2026
  • Doi Number: 10.1093/bioadv/vbag002
  • Journal Name: Bioinformatics Advances
  • Journal Indexes: Emerging Sources Citation Index (ESCI), Scopus, BIOSIS, Directory of Open Access Journals
  • Istanbul University Affiliated: Yes

Abstract

Motivation: Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset. Results: We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions.