Inter‐examiner reliability of the Doha agreement meeting classification system of groin pain in male athletes

The Doha agreement classification is used to classify groin pain in athletes. We evaluated the inter‐examiner reliability of this classification system. We prospectively recruited 48 male athletes (66 symptomatic sides) with groin pain between 10–2017 and 03–2020 at a sports medicine hospital in Qatar. Two examiners (23 and 10 years of clinical experience) performed history taking, and a standardized clinical examination blinded to each other's findings. Examiners classified groin pain using the Doha agreement terminology (adductor‐, inguinal‐, iliopsoas‐, pubic‐, hip‐related groin pain, or other causes of groin pain). Multiple entities were ranked in order of perceived clinical importance. Each side was classified separately for bilateral groin pain. Inter‐examiner reliability was calculated using Cohen's Kappa statistic (κ). Inter‐examiner reliability was slight to moderate for adductor‐ (κ = 0.40), inguinal‐ (κ = 0.44), iliopsoas‐ (κ = 0.57), and pubic‐related groin pain (κ = 0.12), substantial for hip‐related groin pain (κ = 0.62), and slight for “other causes of groin pain” (κ = 0.13). Ranking entities in order of perceived clinical importance improved inter‐examiner reliability for adductor‐, inguinal‐, and iliopsoas‐related groin pain (κ = 0.52–0.65), but not for pubic (κ = 0.12), hip (κ = 0.51), and “other causes of groin pain” (κ = 0.03). For participants with unilateral groin pain classified with a single entity (n = 7), there was 100% agreement between the two examiners. Inter‐examiner reliability of the Doha agreement meeting classification system varied from slight to substantial, depending on the clinical entity. Agreement between examiners was perfect when athletes were classified with a single clinical entity of groin pain, but lower when athletes were classified with multiple clinical entities.

The Doha agreement classification is used to classify groin pain in athletes. We evaluated the inter-examiner reliability of this classification system. We prospectively recruited 48 male athletes (66 symptomatic sides) with groin pain between 10-2017 and 03-2020 at a sports medicine hospital in Qatar. Two examiners (23 and 10 years of clinical experience) performed history taking, and a standardized clinical examination blinded to each other's findings. Examiners classified groin pain using the Doha agreement terminology (adductor-, inguinal-, iliopsoas-, pubic-, hip-related groin pain, or other causes of groin pain). Multiple entities were ranked in order of perceived clinical importance. Each side was classified separately for bilateral groin pain. Inter-examiner reliability was calculated using Cohen's Kappa statistic (κ). Inter-examiner reliability was slight to moderate for adductor-(κ = 0.40), inguinal-(κ = 0.44), iliopsoas-(κ = 0.57), and pubic-related groin pain (κ = 0.12), substantial for hip-related groin pain (κ = 0.62), and slight for "other causes of groin pain" (κ = 0.13). Ranking entities in order of perceived clinical importance improved inter-examiner reliability for adductor-, inguinal-, and iliopsoas-related groin pain (κ = 0.52-0.65), but not for pubic (κ = 0.12), hip (κ = 0.51), and "other causes of groin pain" (κ = 0.03). For participants with unilateral groin pain classified with a single entity (n = 7), there was 100% agreement between the two examiners. Inter-examiner reliability of the Doha agreement meeting classification system varied from slight to substantial, depending on the clinical entity. Agreement between examiners was perfect when athletes were classified with a single clinical entity of groin pain, but lower when athletes were classified with multiple clinical entities.

K E Y W O R D S
groin pain, clinical assessment, hip/pelvis/thigh, diagnosis

| INTRODUCTION
The Doha agreement meeting classification system of groin pain in athletes is a clinical classification system based on patient-reported injury history and clinical examination findings. 1 The classification system addressed the problem of heterogenous terminology for the diagnosis of longstanding groin pain in athletes, 2,3 and was the result of an agreement meeting among 24 international groin experts. 1 The classification system has three major subheadings: (1) defined clinical entities for groin pain (adductor-, inguinal-, iliopsoas-, and pubic-related groin pain), (2) hiprelated groin pain, and (3) other causes of groin pain. 1 As the groin pain experienced by athletes can involve numerous anatomical structures, athletes can be classified as having one single entity (e.g., left-sided adductor-related groin pain) or multiple entities (e.g., bilateral adductorand inguinal-related groin pain). 4 If multiple entities are involved, they can be ranked according to their perceived clinical importance. 5,6 The Doha agreement classification system has now been adopted internationally by many clinicians who assess and treat athletes with groin pain. 7 This uptake suggests good clinical utility of the classification system, but its reliability has yet to be evaluated. In other words, the potential variation between examiners in the classification of groin pain experienced by athletes according to the Doha agreement meeting classification system-the interexaminer reliability-should be investigated. 8 Our primary aim was to evaluate the inter-examiner reliability of the Doha agreement meeting classification system of groin pain in athletes. The secondary aim was to evaluate whether ranking the clinical entities in descending order of perceived clinical importance (primary vs. secondary) would improve the inter-examiner reliability.

| MATERIALS AND METHODS
The study was registered on Clini calTr ials.gov (NCT03590145) and approved by the Anti-Doping Lab Qatar Institutional Review Board (IRB#: E2017000204). The study was performed at an orthopedic and sports medicine hospital in Doha, Qatar. Written informed consent was acquired from all study participants prior to inclusion.

| Protocol deviation
The secondary analyses were not pre-specified in the registered protocol but were specified prior to performing the primary analysis. Additionally, following reviewer comments, we have added a subgroup analysis for the entity classification (paragraph 2.6) to ensure a more complete and transparent reporting of the findings. This allows readers to see the difference between the strict defined classification criteria, and the expert-based application of the Doha agreement recommendations.

| Participants
Individuals were eligible for inclusion if they: (1) were male adult athletes (18-40 years old), (2) regularly participated in recreational or elite sports (≥once/week), (3) presented with a gradual onset of hip and/or groin pain which worsened with exercise, or a sudden onset of hip and/or groin pain that had not recovered and had become longstanding (≥6 weeks). Exclusion criteria included any prior clinical examination or treatment from one of the examiners for the same complaint(s), and fractures or other acute injuries with severe pain where it would be unethical to examine the athlete on two separate occasions. Athletes were also excluded if a second examination could not be performed within 7 days after the first examination. Athletes were prospectively recruited between October 2017 and March 2020.

| Procedures
A comprehensive structured clinical examination (Appendix A) was performed by two clinicians specialized in groin pain in athletes; a general surgeon with 23 years of clinical experience (ZV) and a physiotherapist with 10 years of clinical experience (AS). Both examiners were part of the expert panel during the Doha agreement meeting on groin pain in athletes. 1 The examiners were part of a multidisciplinary groin clinic and therefore assessed multiple athletes with groin pain together before this study commenced. Patients were included from any of the sports medicine clinics at the hospital. Study in-and exclusion criteria were available on a leaflet in every outpatient clinic. The coordinating researcher was notified when an athlete was eligible for inclusion. For practical reasons, the order of examiners was decided by the clinic from which the participant was recruited: If the participant was recruited from the general surgeon's clinic, the general surgeon performed the first clinical assessment. If the participant was recruited from a sports medicine clinic within the hospital, the physiotherapist performed the first clinical assessment. The second examiner was blinded to all information from the first clinical examination. All examiners were blinded to previous imaging reports (if any). Additionally, the participants were instructed not to share any information pertaining to the findings of the first examination. All examiners could perform additional clinical tests at their discretion.

| Injury history and selfreported symptoms
Each clinician engaged the participants in a semistructured dialogue relating to their injury history and self-reported symptoms. This included questions related to classifying athletes according to the Doha agreement meeting classification system, such as pain location, onset, duration of pain, and aggravating activities. 1,9 Clinicians were permitted to ask as many questions as they wanted.
Participants also completed the Copenhagen Hip and Groin Outcome Score (HAGOS). 10 The HAGOS is a patient-reported outcome measures questionnaire comprising six subscales: symptoms; pain; function in daily living (ADL); function in sport and recreation; participation in physical activities; and hip and/or groin-related quality of life. Scores for each subscale are transformed to a 0-100 scale, where 0 represents extreme hip and/or groin symptoms and 100 represents no hip and/or groin symptoms. Examiners were blinded to the HAGOS scores. An Arabic version of the HAGOS was available, as well as a translator during injury history taking and clinical examination for non-English speaking athletes. The HAGOS scores were only used to describe participant characteristics.

| Clinical examination
The structured clinical examination consisted of pain provocation tests, including palpation, resistance testing, and stretching of the hip adductors, hip flexors, and abdominal muscles. Hip range of motion and hip impingement tests such as the flexion-adduction-internal rotation (FADIR) and flexion-abduction-external rotation (FABER) tests were also performed (Appendix A). The pain reported by the athlete during the tests had to correspond to the athlete's self-reported groin pain experienced during participation in sport. Palpation, resistance, and stretch tests used in the classification have moderate to almost perfect interexaminer reliability for the presence of pain. 11

| Doha agreement classification
After injury history taking and clinical examination, each examiner independently classified the athletes' groin pain using the Doha agreement meeting terminology ( Figure 1).

Clinical entity Symptoms and examination findings
Adductor-related groin pain Adductor tenderness and pain on resisted adduction testing.

Iliopsoas-related groin pain
Iliopsoas tenderness. More likely if there is pain on resisted hip flexion and/or pain on hip flexor stretching.

Inguinal-related groin pain
Pain in inguinal canal region and tenderness of the inguinal canal. No palpable inguinal hernia is present. More likely if aggravated by abdominal resistance or Valsalva/cough/sneeze.

Pubic-related groin pain
Local tenderness of the pubic symphysis and the immediately adjacent bone. No particular resistance tests suggested to provoke symptoms related to pubic-related groin pain.

Hip-related groin pain
Clinical suspicion that the hip joint is the source of groin pain, either through history (e.g. mechanical symptoms of locking or catching) and/or clinical examination (e.g. painful and limited range of motion of the hip).

Other causes for groin pain
Clinical suspicion of symptoms or a diagnosis that cannot be classified into one of the previous mentioned entities.
This could be either a single entity or multiple entities. If all required clinical examination findings associated with a specific entity were not present, classification was at the discretion of the examiner-for example, if a patientreported medial thigh pain while playing football and upon adductor palpation, but no pain during resisted hip adduction, this could still be classified as adductor-related groin pain if considered appropriate by the examiner. If multiple entities were classified, examiners were asked to rank the entities in descending order of perceived clinical importance (primary entity, secondary entity, etc.). 5,6 If multiple entities were suspected to be equally important by the examiner, they were allowed to rank multiple entities the same. If athletes had bilateral symptoms, each side was classified separately. Pubic-related groin pain is located at the midline and was determined to be central. Additionally, a subgroup analysis was performed only on the classified entities that met the specific clinical criteria defined in the Doha agreement meeting classification system ( Figure 1). This was based on the clinical examination findings reported by the examiners. For example, for adductor-related groin pain athletes had to report recognizable injury pain during adductor palpation and recognizable injury pain in the adductors on at least one hip adduction resistance test. If an athlete only reported adductor palpation pain but not on resistance testing (or vice versa), this was not classified as adductor-related groin pain in the subgroup analysis. The entities hip-related groin pain and "other causes for groin pain" were also excluded for this subgroup analysis, since these entities do not have defined clinical criteria and/or may require additional investigation for a more definitive classification or diagnosis.

| Sample size
We expected a prevalence between 0.3 and 0.7 for the three most common defined clinical entities (adductorrelated, inguinal-related, and iliopsoas-related groin pain). 4 With an expected κ of at least 0.8 and a 95% confidence interval lower limit of 0.4, assuming no bias between the two examiners, the required sample size was determined to be 48. 13 We expected pubic-related and hip-related groin pain to be less frequent, 4 and therefore accepted a lower limit confidence interval of 0 for these entities. Maintaining an expected kappa of 0.8, this would require a sample of only 10 participants. 13 Thus, we aimed to include 48 participants.

| Inter-examiner reliability and agreement
The general surgeon was the first examiner for 39/48 cases. The general surgeon classified 3.3 entities per participant on average. The physiotherapist classified 2.8 entities per patient on average. Both blinded examiners agreed on the same classification/combination of classifications in 14/48 (29%) of participants and 15/66 (23%) sides. Seven out of the 48 included participants had unilateral symptoms involving only one clinical entity. In these instances, both blinded examiners agreed on 100% (7/7) of the cases. Detailed 2x2 and 3x3 tables are presented in Appendix B.
Both examiners performed the full clinical examination protocol for every athlete. No additional clinical tests were performed beyond those specified in Appendix A. The physiotherapist classified nine sides as "other causes of groin pain" due to pain at the distal rectus abdominis. In six out of nine of these sides, this was in addition to inguinal-related groin pain. The general surgeon classified eight out of nine of these sides as inguinal-related groin pain and none as "other causes of groin pain." The list of other causes can be found in Appendix C. Table 3 presents the subgroup analysis of the interexaminer reliability of the defined clinical entities of groin pain according to the specific criteria reported in the Doha agreement meeting classification system (Figure 1). Fair to substantial reliability was found for the four defined clinical entities of groin pain (κ = 0.23-0.73).

| DISCUSSION
Our study found that the inter-examiner reliability of the Doha agreement meeting classification system of groin pain in athletes varies from slight to substantial. The overall agreement between examiners was perfect when only a single entity of groin pain was classified, but lower when athletes were classified as having multiple entities.
The inter-examiner reliability of the Doha agreement meeting classification system or other classification systems of groin pain in athletes have not been evaluated previously. Hölmich et al. 11 evaluated the inter-examiner reliability of 10 pain provocation tests that were also part of our clinical examination protocols. That study found substantial to almost perfect inter-examiner reliability (κ = 0.64-0.94) for 8 out of 10 individual pain provocation tests. Two tests evaluating pain during the performance of resisted abdominal contractions were reported to be moderately reliable (κ = 0.41-0.57). The Kappa and agreement values for the clinical entities in our study were lower compared to that of the tests described by Hölmich et al. 5 A potential explanation is that there is likely to be a higher level of clinical reasoning by combining tests. In the clinical setting, the combination of patient history and multiple tests are performed and interpreted to classify the groin pain, in contrast to only determining the presence/absence of pain in a single test.
In athletes with unilateral groin pain classified by both examiners with only a single entity (seven participants), we found 100% agreement between the examiners. Hence, clinicians should be aware that the inter-examiner reliability of the Doha agreement meeting classification is lower when an athlete presents with bilateral groin pain and/or multiple entities. For athletes with self-reported pain in multiple locations, ranking entities or diagnoses in descending order of perceived clinical importance has been described previously, although the reliability of such an approach was not evaluated. 5,6 We found that when examiners ranked entities by perceived clinical importance there was higher interexaminer reliability for most prevalent entities adductor-, inguinal-, and iliopsoas-related groin pain (Table 2). Potentially, the primary entity/entities should be the main focus, but whether this influences injury management and prognosis requires further investigation. The high proportion of athletes presenting with bilateral groin pain and/or multiple clinical entities can be explained by recruitment in a tertiary care setting (groin clinic within a sports medicine hospital). These cases may be more complex than what can be expected in a primary care setting, which may influence the inter-examiner reliability negatively. Additionally, previous studies have illustrated that entities of groin pain differ across settings. A study from a private sports medicine clinic in the United Kingdom 14 found "hip-pathology" to be most prevalent, whereas studies from an orthopedic surgeon's clinic in Denmark 5 and a sports medicine hospital in Qatar 4 reported adductorrelated groin pain to be the most prevalent entity (~61%). In our study, most participants were recruited from a general surgeon's groin clinic, which most likely explains the high prevalence of inguinal-related groin pain.
There was a low prevalence of hip-related groin pain (11%) and "other causes of groin pain" (10%) ( Table 2). When a prevalence is low (or high), chance agreement will be high and the associated kappa value will be reduced accordingly. 13 For this reason, these kappa values may be an underestimation. Contrastingly, if the bias index increases, it can result in an overestimation of the kappa. A bias index shows whether one examiner classified a specific entity more or less often, than the other examiner. A bias index of 0 indicates no bias between examiners. In our study, some bias between examiners was detected for pubic-related (bias index: 0.27), iliopsoas-related (bias index: 0.18), inguinal-related groin pain (bias index: 0.14), and "other causes for groin pain" (−0.17).
The inter-examiner bias also highlights the potential limitations of a dichotomized diagnosis or classification. 15 A clinician's "cut-off" to classify athletes according to agreed definitions may differ slightly, meaning that one of the examiners may have a lower threshold to classify a specific entity. For example, the definition for iliopsoas-related groin pain ("iliopsoas tenderness and more likely if there is pain on resisted hip flexion and/or pain on hip flexor stretching") allows a considerable amount of individual examiner interpretation. If an athlete has mild secondary symptoms reproduced during an iliopsoas palpation test, but not during stretch or resistance tests, one examiner may classify this as iliopsoas-related groin pain while the other may not. Additionally, it is not uncommon for athletes with groin pain to be unsure where their pain is located exactly and whether the pain recreated during a clinical examination test is the same as their pain during sport. In our study, the bias index found for pubic-related, iliopsoas-related, and inguinal-related groin pain is likely due to a lower cutoff by the general surgeon in classifying these clinical entities, which is also seen in average total number of clinical entities diagnosed by the surgeon (3.3 per patient), compared to the physiotherapist (2.8 per patient). For "other causes of groin pain," the bias index is mainly a result of the physiotherapist classifying nine sides as "other causes of groin pain" due to pain (also) located at the rectus abdominis, where the general surgeon did not. This shows the challenge in differentiating between adjacent structures in the groin area and the lack of clarity in how to classify pain located at/near the distal rectus abdominis insertion. Subgroup analysis on the four defined clinical entities of groin pain using the specific clinical criteria of the Doha agreement meeting classification system, showed a fair to substantial inter-examiner reliability (κ = 0.23-0.73). These kappa values were higher compared to those of the primary analysis where cases were also classified based on examiner discretion (κ = 0.12-0.57). These differences in kappa values can be explained by the increased subjectivity on the decision to classify a specific entity without all defined criteria. We believe this replicates the use of the classification system in clinical practice and thereby has greater generalizability.

| Limitations
Our study has a few limitations. Firstly, the examiners were experienced clinicians (≥10 years clinical experience) who worked closely together in a multidisciplinary outpatient groin clinic. They were both part of the expert panel in the Doha agreement meeting on groin pain in athletes. 1 Our results may therefore not be generalizable to clinicians with less experience in the diagnosis and clinical management of athletes with groin pain. For example, many physiotherapists are not familiar with the technique of scrotal invagination in the assessment of the inguinal canal. Secondly, this study was performed in a tertiary referral setting where multiple entities were very common. The study could be repeated in a primary healthcare setting where single entities are likely to be more prevalent. Thirdly, our study population solely comprised male athletes and generalizability of our findings to female athletes may therefore be limited. Fourthly, the decision of classifying the entities was ultimately at the examiners' discretion. We believe that this replicates clinical practice, whereby some cases do not always fulfill every diagnostic criterion and hence clinical reasoning of the clinician stays essential. 16 To provide an overview of the reliability of the classification system when all defined criteria were fulfilled, we presented a subgroup analysis. Fifthly, we cannot exclude that participants' symptoms potentially increased during/after the first examination that affected the second examination on the same day. However, a prolonged period between the first and second examination could have led to a change of symptoms (i.e., disease progression bias). Lastly, new research on the HAGOS using modern test theory reported that conversion tables should be applied to compare English, Norwegian, and Danish language versions. 17 In our study, we used the original published HAGOS scoring scale, since many of our study participants used the Arabic translation of the HAGOS, which still requires further validation using modern test theory.

| Perspectives
The Doha agreement meeting classification system of groin pain in athletes has received substantial academic interest. It is commonly used by clinicians to classify the T A B L E 3 Subgroup analysis of the inter-examiner reliability of the four defined clinical entities of groin pain strictly according to the defined criteria Note: N = 48 (66 sides). Subgroup analysis was only performed on entities that were classified according to the specific criteria reported in the Doha agreement meeting classification system (Figure 1). For example, adductor-related groin pain was only classified when the athlete reported recognizable injury pain on adductor palpation and recognizable injury pain in the adductor area during at least one of the hip adduction resistance tests.
Abbreviations: CI, confidence interval; PI, prevalence index. groin pain in athletes. 7 Our study showed that classifying the groin pain experienced by athletes according to the Doha agreement meeting classification system has slight to substantial inter-examiner reliability. Agreement between examiners was perfect when athletes were classified with a single clinical entity of groin pain, but lower when athletes were classified with multiple clinical entities. Future research should investigate further if classifying groin pain according to the Doha agreement meeting classification alters prognosis and management.