Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels

Elisabeth M. Jongbloed, Maurice P.H.M. Jansen, Vanja de Weerd, Jean A. Helmijr, Corine M. Beaufort, Marcel J.T. Reinders, Ronald van Marion, Wilfred F.J. van IJcken, Gabe S. Sonke, Inge R. Konings, Agnes Jager, John W.M. Martens, Saskia M. Wilting, Stavros Makrodimitris*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

11 Downloads (Pure)


Next generation sequencing of cell-free DNA (cfDNA) is a promising method for treatment monitoring and therapy selection in metastatic breast cancer (MBC). However, distinguishing tumor-specific variants from sequencing artefacts and germline variation with low false discovery rate is challenging when using large targeted sequencing panels covering many tumor suppressor genes. To address this, we built a machine learning model to remove false positive variant calls and augmented it with additional filters to ensure selection of tumor-derived variants. We used cfDNA of 70 MBC patients profiled with both the small targeted Oncomine breast panel (Thermofisher) and the much larger Qiaseq Human Breast Cancer Panel (Qiagen). The model was trained on the panels’ common regions using Oncomine hotspot mutations as ground truth. Applied to Qiaseq data, it achieved 35% sensitivity and 36% precision, outperforming basic filtering. For 20 patients we used germline DNA to filter for somatic variants and obtained 245 variants in total, while our model found seven variants, of which six were also detected using the germline strategy. In ten tumor-free individuals, our method detected in total one (potentially germline) variant, in contrast to 521 variants detected without our model. These results indicate that our model largely detects somatic variants.

Original languageEnglish
Article number10424
Number of pages11
JournalScientific Reports
Issue number1
Publication statusE-pub ahead of print - 27 Jun 2023

Bibliographical note

Funding Information:
This study was funded by the Convergence Health and Technology program of Erasmus University Medical Center and Delft University of Technology, the Dutch Cancer Society (KWF12039), and by Breast Cancer Now’s Catalyst Programme (Grant Ref: 2018NovPCC100) which is supported by funding from Pfizer.

Publisher Copyright:
© 2023, The Author(s).


Dive into the research topics of 'Machine learning-based somatic variant calling in cell-free DNA of metastatic breast cancer patients using large NGS panels'. Together they form a unique fingerprint.

Cite this