Distance analysis of large data sets of categorical variables using object weights

Patrick J.F. Groenen*, Jacques J.F. Commandeur, Jacqueline J. Meulman

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

Categorical variables are often analysed by multiple correspondence (or homogeneity analysis), which places great emphasis on graphical representation. A drawback of this method is that sometimes only minor aspects of the data are displayed, or, if a dominant first dimension exists, the horseshoe effect occurs. Here we elaborate on a competing approach to multiple correspondence analysis based on distance approximation. This method emphasizes the distance between objects; they are graphically displayed as points, and objects close together are considered more similar than objects farther apart. A limiting factor of this method is that the number of objects cannot be very large (say, no more than 500). We show how the majorization algorithm for distance approximation can be extended using frequency counts as object weights such that much larger data sets can be analysed without a significant amount of additional computational effort. A second advantage of the use of object weights is that resampling methods, such as the bootstrap are easily implemented. We present two illustrative examples, and investigate the stability in one of them through the bootstrap.

Original languageEnglish
Pages (from-to)217-232
Number of pages16
JournalBritish Journal of Mathematical and Statistical Psychology
Volume51
Issue number2
DOIs
Publication statusPublished - Nov 1998

Fingerprint

Dive into the research topics of 'Distance analysis of large data sets of categorical variables using object weights'. Together they form a unique fingerprint.

Cite this