Objective To apply a data-driven approach to investigate, in patients newly presenting with undifferentiated inflammatory synovitis, key variables that discriminate the subset of patients at sufficiently high risk of persistent or erosive disease for the purpose of developing new criteria for rheumatoid arthritis (RA). Methods In this first phase of the collaborative effort of the American College of Rheumatology and European League Against Rheumatism to develop new criteria for RA, a pooled analysis of early arthritis cohorts made available by the respective investigators is presented. All the variables associated with the gold standard of treatment with methotrexate during the first year after enrolment were first identified. Principal component analysis was then used to identify among the significant variables those sets that represent similar domains. In a final step, from each domain one representative variable was extracted, all of which were then tested for their independent effects in a multivariate regression model. From the OR in that final model, the relative weight of each variable was estimated. Results The final domains and variables identified by this process (and their relative weights) were: swelling of a metacarpophalangeal joint (MCP; 1.5), swelling of a proximal interphalangeal joint (PIP; 1.5), swelling of the wrist (1.5), tenderness of the hand (ie, MCP, PIP or wrist (2)), acute phase reaction (ie, C reactive protein or erythrocyte sedimentation rate and weights for moderate or high elevations of either one (1 for moderate, 2 for high elevation)) and serological abnormalities (ie, rheumatoid factors or anti-citrullinated protein antibodies, again with separate weights for moderate or high elevations (2 and 4, respectively)). Conclusion The results of this first phase were subsequently used in the second phase of the project, which is reported in a separate methodological paper, and for derivation of the final set of criteria.