Short personality questionnaires are increasingly used in research and practice, with some scales including as few as two to five items per personality domain. Despite the frequency of their use, these short scales are often criticized on the basis of their reduced internal consistencies and their purported failure to assess the breadth of broad constructs, such as the Big 5 factors of personality. One reason for this might be the use of principles routed in Classical Test Theory during test construction. In this study, Generalizability Theory is used to compare psychometric properties of different scales based on the NEO-PI-R and BFI, two widely-used personality questionnaire families. Applying both Classical Test Theory (CTT) and Generalizability Theory (GT) allowed to identify the inner workings of test shortening. CTT-based analyses indicated that longer is generally better for reliability, while GT allowed differentiation between reliability for relative and absolute decisions, while revealing how different variance sources affect test score reliability estimates. These variance sources differed with scale length, and only GT allowed clear description of these internal consequences, allowing more effective identification of advantages and disadvantages of shorter and longer scales. Most importantly, the findings highlight the potential error proneness of focusing solely on reliability and scale length in test construction. Practical as well as theoretical consequences are discussed.