Study Design. An international group of back pain researchers considered recommendations for standardized measures in clinical outcomes research in patients with back pain. Objectives. To promote more standardization of outcome measurement in clinical trials and other types of outcomes research, including recta-analyses, cost-effectiveness analyses, and multicenter studies. Summary of Background Data. Better standardization of outcome measurement would facilitate comparison of results among studies, and more complete reporting of relevant outcomes. Because back pain is rarely fatal or completely cured, outcome assessment is complex and involves multiple dimensions. These include symptoms, function, general well-being, work disability, and satisfaction, with care. Methods. The panel considered several factors in recommending a standard battery of outcome measures. These included reliability, validity, responsiveness, and practicality of the measures. In addition, compatibility with widely used and promoted batteries such as the American Academy of Orthopaedic Surgeons Lumbar Cluster were considered to minimize the need for changes when these instruments are used. Results. First, a six-item set was proposed, which is sufficiently brief that it could be used in routine care settings for quality improvement and for research purposes. An expanded outcome set, which would provide more precise measurement for research purposes, includes measures of severity and frequency of symptoms, either the Roland or the Oswestry Disability Scale, either the SF-12 or the EuroQol measure of general health status, a question about satisfaction with symptoms, three types of 'disability days,' and an optional single item on overall satisfaction with medical care. Conclusion. Standardized measurement of outcomes would facilitate scientific advances in clinical care. A short, 6-item questionnaire and a somewhat expanded, more precise battery of questionnaires can be recommended. Although many considerations sup port such recommendations, more data on responsiveness and the minimally important change n scores are needed for most of the instruments.