Principal component analysis

Michael Greenacre*, Patrick J.F. Groenen, Trevor Hastie, A Iodice D'Enza, Angelos Markos, Elena Tuzhilina

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

83 Citations (Scopus)


Principal component analysis is a versatile statistical method for reducing a cases-by-variables data table to its essential features, called principal components. Principal components are a few linear combinations of the original variables that maximally explain the variance of all the variables. In the process, the method provides an approximation of the original data table using only these few major components. This Primer presents a comprehensive review of the method’s definition and geometry, as well as the interpretation of its numerical and graphical results. The main graphical result is often in the form of a biplot, using the major components to map the cases and adding the original variables to support the distance interpretation of the cases’ positions. Variants of the method are also treated, such as the analysis of grouped data and categorical data, known as correspondence analysis. Also described and illustrated are the latest innovative applications of principal component analysis: for estimating missing values in huge data matrices, sparse component estimation, and the analysis of images, shapes and functions. Supplementary material includes video animations and computer scripts in the R environment.

Original languageEnglish
Article number100
Number of pages21
JournalNature Reviews Methods Primers
Issue number1
Publication statusPublished - Dec 2022

Bibliographical note

Funding Information:
This review is dedicated to the memory of Professor Cas Troskie, who was the head of the Department of Statistics at the University of Cape Town, both teacher and mentor to M.G. and T.H., and who planted the seeds of principal component analysis in them at an early age. T.H. was partially supported by grants DMS2013736 and IIS1837931 from the National Science Foundation, and grant 5R01 EB001988-21 from the National Institutes of Health. E.T. was supported by the Stanford Data Science Institute.

Publisher Copyright:
© 2022, Springer Nature Limited.


Dive into the research topics of 'Principal component analysis'. Together they form a unique fingerprint.

Cite this