TY - JOUR
T1 - Katdetectr
T2 - an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
AU - Hazelaar, Daan M.
AU - Van Riet, Job
AU - Hoogstrate, Youri
AU - Van De Werken, Harmen J.G.
N1 - Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press GigaScience.
PY - 2023
Y1 - 2023
N2 - Background: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount. Findings: Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters. We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets. Conclusions: Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr.
AB - Background: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount. Findings: Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters. We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets. Conclusions: Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr.
UR - https://www.scopus.com/pages/publications/85174750467
U2 - 10.1093/gigascience/giad081
DO - 10.1093/gigascience/giad081
M3 - Article
C2 - 37848617
AN - SCOPUS:85174750467
SN - 2047-217X
VL - 12
JO - GigaScience
JF - GigaScience
M1 - giad081
ER -