Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Bjarni J. Vilhjálmsson*, Psychosis Endophenotypes International Consortium, Wellcome Trust Case Control Consortium, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON), Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study, Jian Yang, Hilary K. Finucane, Alexander Gusev, Sara Lindström, Stephan Ripke, Giulio Genovese, Po Ru Loh, Gaurav Bhatia, Ron Do, Tristan Hayeck, Hong Hee Won, Benjamin M. Neale, Aiden Corvin, James T.R. WaltersKai How Farh, Peter A. Holmans, Phil Lee, Brendan Bulik-Sullivan, David A. Collier, Hailiang Huang, Tune H. Pers, Ingrid Agartz, Esben Agerbo, Margot Albus, Madeline Alexander, Farooq Amin, Silviu A. Bacanu, Martin Begemann, Richard A. Belliveau, Judit Bene, Sarah E. Bergen, Elizabeth Bevilacqua, Tim B. Bigdeli, Donald W. Black, Richard Bruggeman, Nancy G. Buccola, Randy L. Buckner, William Byerley, Wiepke Cahn, Lyudmila Georgieva, Tao Li, Jing Qin Wu, Roel A. Ophoff, Danielle Posthuma, Peter Kraft, Muriel Adank, Hanne Meijers-Heijboer, María José Sánchez, Andre G. Uitterlinden

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

799 Citations (Scopus)

Abstract

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Original languageEnglish
Pages (from-to)576-592
Number of pages17
JournalAmerican Journal of Human Genetics
Volume97
Issue number4
DOIs
Publication statusPublished - 1 Jan 2015

Bibliographical note

Funding Information:
We thank Shamil Sunayev, Brendan Bulik-Sullivan, Liming Liang, Naomi Wray, Daniel Sørensen, and Esben Agerbo for useful discussions. We would also like to thank Toni Clarke for useful comments on the software. This research was supported by NIH grants R01 GM105857, R03 CA173785, and U19 CA148065-01. B.J.V. was supported by Danish Council for Independent Research grant DFF-1325-0014. H.K.F. was supported by the Fannie and John Hertz Foundation. This study made use of data generated by the Wellcome Trust Case Control Consortium (WTCCC) and the Wellcome Trust Sanger Institute. A full list of the investigators who contributed to the generation of the WTCCC data is available at www.wtccc.org.uk . Funding for the WTCCC project was provided by the Wellcome Trust under award 076113.

Publisher Copyright:
© 2015 The American Society of Human Genetics. All rights reserved.

Fingerprint

Dive into the research topics of 'Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores'. Together they form a unique fingerprint.

Cite this