Introduction

The identification of the human epidermal growth factor receptor-2 (HER2) as an important cellular marker in the pathophysiology and treatment of breast cancer [13] has highlighted the importance of reliable testing methodology [1, 2]. Multiple discussions and publications related to this issue have been presented, not only addressing type of test, reliability, and definition of “positivity,” but also which tests may best help predict the efficacy of anti-HER2 therapies for patients [315]. Both North Central Cancer Treatment Group (NCCTG) and National Surgical Adjuvant Breast and Bowel Project (NSABP) NCI-supported Cancer Cooperative Groups demonstrated that approximately 3–7 % of breast cancers formerly assessed as HER2-positive in local laboratories were called HER2-normal (IHC < 10 % of cells with circumferential membrane staining; FISH HER2:CEP17 ratio < 2.0) when evaluated centrally [11, 12, 16]. Patients with these HER2-normal breast cancers appeared to have similarly prolonged disease-free survival (DFS) when treated with trastuzumab as those HER2-positive breast cancers corroborated in the central laboratories [1113, 16]. These findings could be correct or alternatively due to a wide range of methodologic variables, including differing cutoffs for positivity, reading errors, discordance between pathologist interpretation and/or intratumoral heterogeneity. To address these critical aspects of HER2 testing, we conducted a round-robin study designed to evaluate HER2 testing and impact on patient outcomes using samples from two fully annotated specimen banks. The first set included samples from one of the two NCI-sponsored HER2-positive adjuvant trials, NCCTG-N9831 that first reported these findings (Perez et al. [12]). The second set was from two separate Breast Cancer International Research Group adjuvant trials in HER2-positive (BCIRG-006) and HER2-normal (BCIRG-005) breast cancers [17, 18]. There were four major objectives of this study. The first was to determine the concordance in testing and reading shared samples for HER2 results by the three central laboratories of the NCCTG, BCIRG, and NSABP. The second was to determine whether discordant cases could be adjudicated in a face-to-face meeting of all the pathologists simultaneously reading discordant samples at a multi-headed microscope. The third was to evaluate the presence and frequency of intratumoral heterogeneity in HER2 status across multiple blocks from the same patient. Fourth, we sought to determine the impact of trastuzumab therapy in patients whose tumors initially tested HER2-positive in local laboratories but found to be HER2-normal by central testing and subsequently adjudicated as HER2-normal in this round-robin study of N9831 cases.

Methods

Specimens

Primary tumor blocks (n = 389) from three adjuvant trials (NCCTG N9831/NCT00005970; BCIRG-005/NCT00312208; BCIRG-006/NCT00021255; clinicaltrials.gov) were sampled from the respective central tissue banks of the two groups (Mayo Clinic, Rochester, MN and University of Southern California, Los Angeles, CA). Blocks from NSABP B-31 were not made available for this project. Patient tissues were randomly selected from cases with available blocks in prospectively defined subgroups as previously determined by each group’s independent central review. The subgroups included tumors read centrally as IHC−/FISH−, IHC−/FISH+, IHC+/FISH−, and IHC+/FISH+. Additional criteria were availability of multiple tumor blocks from the same case and treatment (N9831 specimens only). The blocks included 86 blocks from 62 patients in N9831 whose disease was centrally classified as HER2-normal by IHC (IHC−; IHC 0, 1+, or 2+) and FISH-negative (FISH−; HER2:CEP17 ratio < 2.0) (Table 1). Disease outcome for these 62 N9831 patients, previously classified as HER2-positive in local testing, was available for this study. We also included 105 blocks from 51 N9831 and 18 BCIRG-006 patients whose HER2 results were discordant between IHC and FISH testing within the central laboratories (33 IHC+/FISH−, 36 IHC−/FISH+). Additional cases included 54 blocks from 37 patients whose disease was centrally IHC+/FISH+ from N9831 and BCIRG-006; and 144 blocks from 96 patients whose disease was centrally IHC−/FISH—from BCIRG-005. For 121 patients, two blocks from the same primary tumor site were examined, and for two patients, three blocks were examined (Table 1). This study was approved by the Institutional Review Boards of each participating laboratory.

Table 1 Patient/disease characteristics

Specimen de-identification

The NCCTG and BCIRG submitted their respective tissue blocks to a central laboratory (Mayo Clinic) where the blocks were sectioned and re-identified, as described in the online supplement. This re-identification process insured that neither the contributing organization nor the individual subject could be identified without use of a coding key that was retained by the central statistical office for the purpose of blinding all reading pathologists as to the original classification of cases.

HER2 testing methods

The IHC HercepTest™ kit was used to determine HER2 protein expression according to manufacturer’s instructions (Dako, Carpenteria, CA). The FISH PathVysion® HER2 DNA probe kit/HER2/CEP17 probe mixture (Abbott Molecular, Des Plaines, IL) was used to determine HER2 gene and chromosome 17 copy number in each of the 60 nuclei with slight modifications by each laboratory (Supplemental Table 1). HER2-positivity was defined according to FDA-approved guidelines used in the clinical trials (IHC+: uniform, intense circumferential membrane staining in >10 % invasive tumor cells; FISH+: HER2/CEP17 ratio ≥ 2.0). The BCIRG central laboratory also required that the average HER2 copy number be ≥ 4.0 copies per tumor cell nucleus to be scored as amplified [19]; the other central laboratories did not have this requirement. The HER2 status (IHC: 3+ vs. 0–2+; FISH: amplified vs. not) for each block was independently determined at each site.

Adjudication

The IHC and FISH cases that were discordant among the three pathologists were adjudicated by the three groups at a face-to-face meeting to determine (in a blinded analysis) the presence or absence of the HER2 alteration. This analysis was in addition to the re-analysis that had already been performed on these cases by each individual pathology group. In cases where consensus was not reached, slides were re-assayed (stained and scored) by FISH at the University of Southern California, and the stained slide was then sent to the other two central laboratories for scoring. After completion of all pathology and adjudication activities, data were unblinded and analyzed by the statistician (ACD).

Statistical analysis

A block must have had two or more pathology reads to be included in a given analysis. The adjudicated score and the majority score of the remaining cases in which a consensus was not reached (N = 14 for IHC, N = 12 for FISH) were used in all statistical analyses. The percent agreement (HER2-positive vs. HER2-normal) between the original central review result and round-robin final HER2 result of the primary block (defined as the block used in the original central review for clinical trial eligibility; one per patient) was computed in the subsets of BCIRG-005 HER2-normal, N9831 HER2-normal, and N9831/BCIRG-006 HER2-positive specimens (all by central review). The agreement among blocks within patients in the same subsets of specimens was also estimated using percent agreement.

Disease-free survival was defined as time from randomization to first local, regional, or distant recurrence, contralateral breast cancer, another primary cancer (except squamous or basal cell skin cancer, carcinoma in situ of the cervix, or lobular carcinoma in situ of the breast), or death from any cause. DFS of N9831 patients with local pathologist’s positive HER2 result but central and adjudicated (on all blocks) HER2-normal (IHC−/FISH−) breast cancer, was plotted by arm using Kaplan–Meier curves and compared via a Cox proportional hazards model stratified by hormone receptor and nodal status. Statistical significance was defined as a two-sided p value < 0.05 throughout.

Role of the funding source

The funding source had no role in study design, data, or writing. The corresponding author, Dr. Edith Perez, had full access to all data in the study and had final responsibility for the decision to submit for publication.

Results

HER2 concordance and adjudication

Independent reads were concordant across the three pathologists by IHC in 351/381 (92 %) cases and by FISH in 343/373 (92 %) cases (Fig. 1). Consensus was reached on 16/30 discordant IHC and 18/30 discordant FISH cases (Fig. 1). Thus, adjudication led to consensus in 367/381 (96 %) and 361/373 (97 %) of IHC and FISH cases, respectively.

Fig. 1
figure 1

Overall concordance. The number of blocks showing concordance/discordance in IHC and FISH testing among three central laboratories. *Retest: 19 of the original 30 discordant FISH cases were not adjudicated at the face-to-face meeting. These 19 cases were re-assayed (stained and scored) by FISH at USC, and the stained slide was then sent to the other two central laboratories for scoring

Fourteen (4 %) IHC and 12 (3 %) FISH cases could not be adjudicated after face-to-face review and after re-testing of the FISH cases. Only one case was common between these two sets. Nine of 14 (64 %) non-adjudicated IHC cases had a two-thirds majority IHC 2+ score. Nine of 12 (75 %) non-adjudicated FISH cases had a two-thirds majority of non-amplified. The 12 non-adjudicated FISH cases had HER2:CEP17 FISH ratios spanning the 2.0 cut-off: ranging from 1.54 to 2.36 (average: 2.01; NCCTG), 1.13–2.22 (average: 1.72; NSABP), and 1.43–2.45 (average: 1.92; BCIRG).

Of 373 blocks with both an adjudicated IHC and FISH result, the overall concordance between IHC and FISH was 92 % (343/373). Among the IHC-negative blocks, concordance with FISH-negativity was 94 % (264/281), and among the IHC-positive blocks, concordance with FISH-positivity was 86 % (79/92). Among FISH-negative blocks, concordance with IHC-negativity was 95 % (264/277), and among the FISH-positive blocks, concordance with IHC-positivity was 82 % (79/96).

Concordance between round-robin and original central review result

In the primary block of 96 BCIRG-005 patients with HER2-normal disease, IHC and FISH-negativity were confirmed in all 96 (100 %) cases (Table 2). In the primary block of 59 evaluable N9831 central HER2-normal cases, IHC and FISH-negativity were confirmed in the primary block in 57 (97 %) cases (Table 2). In the primary block of 102 N9831/BCIRG-006 HER2-positive cases, HER2-positivity was confirmed in 73 (72 %) cases (Table 2), resulting from 29/102 primary blocks being either FISH-negative or IHC-negative locally, and FISH-negative and IHC-negative centrally. Among these 102 primary blocks, all 36 (100 %) BCIRG-006, originally HER2-positive blocks, were consistently evaluated as FISH-positive (the definition of “HER2-positive” in BCIRG-005/BCIRG-006 trials); and 34 (94 %) were also concordantly considered IHC-positive (IHC not assessed for entry to BCIRG trials) (Table 2). In the primary block of 66 cases with central discordant IHC/FISH status, the adjudicated result agreed in 14 (21 %) cases (Table 2).

Table 2 The concordance between central and adjudicated HER2 status in the primary block

Block-to-block intratumoral HER2 heterogeneity overall

Among 121 patients with two tissue blocks and two patients with three tissue blocks for analysis (Tables 1, 3), IHC results were obtained in 118 patients with the adjudicated IHC result agreeing across blocks in 106 (90 %) (Table 3). Among 113 patients with FISH results for >1 block, the adjudicated FISH result agreed across blocks in 107 (95 %) (Table 3). Of the 22 N9831 patients with HER2-normal (IHC-negative/FISH-negative) disease with duplicate blocks, 5/22 (23 %) tested positive (by IHC and/or FISH) in at least one of the duplicate blocks (Table 3), clearly demonstrating heterogeneous HER2 gene copy number and/or protein immunostaining in the same tumor (Fig. 2). Moreover, the focal HER2-amplified region corresponded to the areas of HER2 protein over-expression, confirming the presence of a population of HER2-positive cells in these tumors that had been called HER2-normal (Fig. 2a, b).

Table 3 Intratumoral heterogeneity
Fig. 2
figure 2

Intratumoral HER2 heterogeneity. HER2 protein and gene/chromosome heterogeneity in the same tumor. a HER2 gene amplification. Representative FISH staining demonstrating a focal HER2 amplified clone that corresponds to the area of HER2 protein over-expression in b. b Variable HER2 IHC Protein Immunostaining. The area identified shows IHC 3+ immunostaining, while the remainder of the microscopic field shows IHC 2+ immunostaining heterogeneity. c Representative FISH staining demonstrating polysomy 17 in the same tumor as in a and b

Block-to-block intratumoral HER2 heterogeneity and patient outcome in N9831 central and adjudicated HER2-normal cases

Of the original 103 patients identified in N9831 as locally HER2-positive but centrally HER2-normal, 59 blocks were evaluable out of 62 cases with tissue. Three cases were technical failures due to insufficient tumor tissue. Of the remaining 59 patients, 53 (90 %) had disease adjudicated as IHC-negative/FISH-negative for HER2. Among 22 of these 53 with more than one block available for analysis, 1/22 (5 %) was adjudicated as HER2-positive (by either IHC or FISH) in the primary block and 4/22 (18 %) had a second block adjudicated as HER2-positive (by either IHC or FISH) (Tables 2, 3). Among the 53 N9831 HER2-normal cases adjudicated as IHC-negative and FISH-negative (despite a prior HER2-positive test locally), there was an unadjusted trend in improvement in DFS associated with trastuzumab given concurrently with paclitaxel after doxorubicin/cyclophosphamide compared to chemotherapy alone (HR = 0.31, p = 0.06, 95 % CI 0.11–0.91; AC-T: 23 pts, ten events; AC-TH: 30 pts, five events). When adjusted for hormone receptor and nodal status, there was a non-statistically significant improvement in DFS associated with trastuzumab administered concurrently with chemotherapy compared to chemotherapy alone, based on a relatively small number of events (HR = 0.34, p = 0.06, 95 % CI 0.11–1.05; AC-T: 23 pts, ten events; AC-TH: 30 pts, five events) (Fig. 3). When considering only those few cases in which two blocks were both adjudicated as IHC-negative/FISH-negative (n = 17/53), a similar non-significant trend in DFS improvement was observed (HR = 0.29, p = 0.16, 95 % CI 0.05–1.65; AC-T: 8 pts, four events; AC-TH: 9 pts, two events).

Fig. 3
figure 3

Kaplan–Meier curves of DFS in N9831 patients with IHC−/FISH− disease. All patients had IHC−/FISH− disease by central review and all blocks adjudicated in the current study as IHC−/FISH−

Discussion

The importance of HER2 as a prognostic marker in invasive breast cancer is well established [1, 2]. As such, it is critical to validate and standardize testing strategies to make an accurate assessment of HER2 status [3, 6]. The value of the current study becomes even more relevant with the provocative results from N9831 and NSABP-B31 demonstrating that in patients whose tumors were classified as HER2-normal based on central testing (although originally testing positive in local labs), there appears to be similar hazard ratios for benefit from adjuvant trastuzumab-based therapy when compared to chemotherapy alone [12, 16]. If patients exist who could benefit from this well-tolerated, effective therapeutic, but who may be misclassified by current HER2 testing (false-negatives), newer methods should be evaluated with attempts to determine better ways of identifying these patients. Similarly, we need to determine if there are patients receiving trastuzumab who are unlikely to benefit [9, 20]. The main objective of this study is to address critical aspects of HER2 testing through actual collaborative methodologic evaluation rather than a consensus review of literature published by independent groups [9]. To accomplish this, we conducted a round-robin study among pathologists from three central laboratories utilizing blocks from two HER2-positive adjuvant trastuzumab trials (N9831 and BCIRG-006) and one HER2-normal trial (BCIRG-005) to evaluate current HER2 testing methods and their potential impact on clinical outcomes in tumors from annotated trials.

The pre-round robin discordance rate for HER2 status (both IHC and FISH) in these cases as tested among the three expert pathologists was 8 %. At adjudication, a ≥96 % agreement was observed among these same pathologists, suggesting that interpretation issues and/or HER2 tumor heterogeneity may play a significant role in discordant results. The overall concordance between the adjudicated IHC and FISH results was 92 %.

Similar to the results of an international HER2 proficiency group study performed between five central laboratories [21], the majority of samples that could not be successfully adjudicated had IHC or FISH equivocal results as defined by the ASCO/CAP guidelines for HER2 positivity [9]. Of the 14 IHC cases that could not be successfully adjudicated, 64 % were classified as 2+. Circumferential distribution and character (intensity, granularity) of cell staining, rather than quantity of stained cells, was the main reason for discordance among the round-robin pathologists, similar to what was reported in a recent HER2 proficiency testing study [22]. The 12 non-adjudicated FISH cases had HER2:CEP17 FISH ratios with an average of 1.88 (range 1.13–2.45) and an average HER2 copy number of 4.67 per nucleus across pathologists, both of which are near the FDA-approved cut-offs of 2.0 for ratio and 4.0 for copy number.

Although the absolute HER2 counts were similar across the reading pathologists, small changes in the CEP17 counts (denominator) can and did significantly affect this ratio and changed amplification status of HER2 when using the HER2:CEP17 ratio. As a result, the interpretation of HER2 gene amplification differed in these cases. These data indicate that in equivocal cases, HER2 gene and CEP17 copy numbers should be assessed independently and may have important clinical implications [23].

When the HER2 results are in the equivocal range, pathologists should consider consulting with a second pathologist to corroborate or possibly adjudicate the HER2 status. An accompanying explanation/interpretation of the HER2 status from the pathologist(s) is critical to help guide the clinician in making appropriate management decisions [4]. The patient also needs to be informed of challenges associated with HER2 testing, particularly in cases near the FISH ratio of 2.0 as recommended by the FDA. A trend toward benefit from trastuzumab (adjusted HR = 0.34; p-0.06) was observed in the small subset of N9831 patients (n = 53) with disease deemed HER2-normal by central review and confirmed on the limited number of blocks in the round-robin (although initially called HER2-positive locally). While this observation is based on a very small number of events, we recognize that trends toward benefit are important to document, despite failing to reach statistical significance. An alternative and equally plausible explanation for the observation of benefit in HER2-normal cases could be the heterogeneity we observed in both HER2 protein overexpression and gene amplification during assessment of more than one block from the same patient. The current data indicate that 5/22 patients (23 %) that had locally tested as HER2-positive in N9831 and subsequently called IHC-negative/FISH-negative centrally, had a second block from the same primary tumor that we found to be HER2-positive in the round-robin analysis. Accordingly, testing an additional portion of the tumor from “HER2-normal” cases may be advisable to avoid the possibility of not treating patients who might benefit from HER2-targeted therapies. This phenomenon could account for some discrepancies between local and central HER2-testing in N9831 and NSABP B-31. One option to avoid this would be to consider testing a second section from an additional tumor block for each patient whose initial HER2 testing yields normal results.

The incidence of variable IHC staining was higher for overexpression (10 %) compared to heterogeneity of amplification status (5 %). Variable rates of intratumoral HER2 heterogeneity (<1–30 %) of unknown clinical significance have also been reported [15, 2428]. In addition, we occasionally observed distinct intratumoral heterogeneity within the same tumor section (Fig. 2). When distinct populations of cells exist in the same section, the HER2 status (by IHC and FISH) should be reported in accordance with established guidelines [9].

Approximately 8–10 % of HER2-amplified breast cancers are falsely negative (IHC 0 or 1+) by IHC HER2 testing [14, 15], and given the high incidence of breast cancer in the US, this rate of false-negativity could negatively and critically impact the care of more than 5,000 patients each year [8]. Based on the approximate 50 % reduction of relapse events when anti-HER2-targeted therapy is included in the adjuvant setting, thousands of women in the US each year may be experiencing relapses from failed adjuvant regimens that may not have occurred had these women been correctly identified as patients with HER2-positive disease and initially treated with trastuzumab- or lapatinib-based regimens. Improving the accuracy of HER2 testing and reducing the incidence of false-negative results can directly save lives [29].