TY - JOUR

T1 - Distance analysis of large data sets of categorical variables using object weights

AU - Groenen, Patrick J.F.

AU - Commandeur, Jacques J.F.

AU - Meulman, Jacqueline J.

PY - 1998/11

Y1 - 1998/11

N2 - Categorical variables are often analysed by multiple correspondence (or homogeneity analysis), which places great emphasis on graphical representation. A drawback of this method is that sometimes only minor aspects of the data are displayed, or, if a dominant first dimension exists, the horseshoe effect occurs. Here we elaborate on a competing approach to multiple correspondence analysis based on distance approximation. This method emphasizes the distance between objects; they are graphically displayed as points, and objects close together are considered more similar than objects farther apart. A limiting factor of this method is that the number of objects cannot be very large (say, no more than 500). We show how the majorization algorithm for distance approximation can be extended using frequency counts as object weights such that much larger data sets can be analysed without a significant amount of additional computational effort. A second advantage of the use of object weights is that resampling methods, such as the bootstrap are easily implemented. We present two illustrative examples, and investigate the stability in one of them through the bootstrap.

AB - Categorical variables are often analysed by multiple correspondence (or homogeneity analysis), which places great emphasis on graphical representation. A drawback of this method is that sometimes only minor aspects of the data are displayed, or, if a dominant first dimension exists, the horseshoe effect occurs. Here we elaborate on a competing approach to multiple correspondence analysis based on distance approximation. This method emphasizes the distance between objects; they are graphically displayed as points, and objects close together are considered more similar than objects farther apart. A limiting factor of this method is that the number of objects cannot be very large (say, no more than 500). We show how the majorization algorithm for distance approximation can be extended using frequency counts as object weights such that much larger data sets can be analysed without a significant amount of additional computational effort. A second advantage of the use of object weights is that resampling methods, such as the bootstrap are easily implemented. We present two illustrative examples, and investigate the stability in one of them through the bootstrap.

UR - http://www.scopus.com/inward/record.url?scp=0009209173&partnerID=8YFLogxK

U2 - 10.1111/j.2044-8317.1998.tb00678.x

DO - 10.1111/j.2044-8317.1998.tb00678.x

M3 - Article

AN - SCOPUS:0009209173

SN - 0007-1102

VL - 51

SP - 217

EP - 232

JO - British Journal of Mathematical and Statistical Psychology

JF - British Journal of Mathematical and Statistical Psychology

IS - 2

ER -