Introduction

Almost 140,000 cases of acute myeloid leukemia (AML) and 100,000 deaths are reported worldwide per year with steady increasing incidence largely due to population growth and aging [1]. While progress has been made particularly in younger patients through intensive chemotherapy (IC) and stem cell transplantation (HSCT), the majority of AML patients (> 60 years of age) have historically been considered to be ineligible for intensive therapies because of comorbidities, more aggressive leukemia biology, and reduced tolerance to intensive therapy. On the other hand, IC in newly diagnosed elderly AML patients, with or without poor performance status, does improve survival when compared to best supportive care [2, 3]. A randomized study of IC vs. best supportive care combined with mild cytoreductive therapy confirmed better survival by IC with comparable hospitalization frequency [4]. Despite these findings, the majority of older patients with AML are not offered IC and those receiving it had a 5-year survival rate of only 8% [5, 6]. More recently, treatment rates have increased from 35 to 50% following improvements in supportive therapy [7, 8]. Elderly patients are now treated similarly to younger patients with the aim of inducing complete remission (CR) and maintaining long-term remission using consolidation and/or HSCT. Although inferior to results in younger patients, CR rates have improved up to 66.7% [9].

Recent discoveries in biology have enriched treatment options for AML. Modifying epigenetics with hypomethlating agents (HMA) induces CRs with lower toxicity than IC in some pretreated patients and patients with comorbidities [10]. In addition to epigenetics, disturbance in the regulation of apoptosis involving, e.g., bcl-2 has been identified as common mechanism in AML. The concept of blocking bcl-2 has been tested successfully in refractory disease as monotherapy and in combination with epigenetic therapy in newly diagnosed patients with AML [11]. These treatments lead to CR rates similar to those of IC with a high proportion of molecular remissions and low therapy-related mortality [12, 13].

Inhibition of driver mutations or their products in sub-groups of newly diagnosed patients with AML has increased in combination with chemotherapy overall survival [14]. Other targeted therapies such as IDH inhibitors have shown promising results as mono- or combination therapies in phase I and II studies [15,16,17,18].

Many of these new treatment approaches are now being tested in combination with IC as first line therapy, which remains the backbone of therapy even in fit elderly patients. The situation is further complicated by selection bias for eligibility to IC due to increased disease risk and comorbidities. Furthermore, because of disease heterogeneity, determining outcome of low, intermediate, and poor risk disease may be of crucial importance for choosing the best treatment intensity and strategy.

The ideal IC aims to balance between efficacy and therapy-induced morbidity and mortality without selection bias and still needs to be defined. For this reason, we considered the well-established standard 3 + 7 protocol as baseline and compared the outcome to those of patients treated with more intensive treatment regimens of two AML German study groups [12, 19, 20]. A randomization ratio of 9:1 was chosen to allow study group specific questions to be answered.

Patients and methods

Patients

Patients ≥ 60 years of age with non-promyelocytic AML were centrally randomized up-front in a 9:1 assignment to study specific arms of the German AML cooperative Group (AMLCG) or the East German Study Group Hematology and Oncology (OSHO) compared to a CSA (suppl. Figure S1). The AMLCG study arm randomized TAD (ara-C 100 mg/m2/d continuous infusion (CI) d1-2 followed by 30-min IV infusion BID d 3–8, daunorubicin 60 mg/m2/d IV d 3–5 and 6-thioguanine 100 mg/m2/d p.o. BID d 3–9) followed by HAM (ara-C 1 g/m2/d IV BID d 1–3 and mitoxantrone 10 mg/m2/d IV d 3–5) versus two courses of HAM ± G-CSF, with the second induction course only applied in case of blast persistence. One course of TAD was given as consolidation followed by maintenance chemotherapy over three years [21]. The OSHO AML04 study included ara-C 1 g/m2/d BID IV d 1 + 3 + 5 + 7 and mitoxantrone 10 mg/m2/d IV d 1 – 3 for one or two induction courses and ara-C 500 mg/m2 BID 1 h IV d 1 + 3 + 5 in combination with mitoxantrone 10 mg/m2/d IV d 1 + 2 as consolidation twice. Pegfilgrastim 6 mg s.c. was given on day 10 of induction and on day 8 of consolidation. Allogeneic related or unrelated HSCT following non-myeloablative conditioning was considered after CR. The CSA consisted of one or two induction cycles of ara-C 100 mg/m2/d CI d 1–7 and daunorubicin 60 mg/m2/d IV d 3, 4, 5 (3 + 7 regimen) followed by two courses of ara-C 1 g/m2/d BID IV d 1 + 3 + 5 as consolidation [20]. Detailed information on therapies of the study groups and CSA are given in suppl. Figure 1. Cytogenetic and molecular risk was determined as previously described [22].

Inclusion criteria contained all consecutive AML (de novo, secondary, and therapy related, except APL) diagnosed in the study period. Exclusion criteria included inability of the patient to understand the study and give informed consent, non AML-related renal insufficiency, liver insufficiency, cardiac insufficiency NYHA III + IV, concurrent acute myocardial infarction, and uncontrolled infection such as pneumonia with hypoxia or septic shock.

The study was approved by the Institutional Review Board (IRB) of the University of Leipzig, registered at clinicaltrials.gov (NCT01497002 and NCT00266136) and the approval notified to IRBs of the participating centers. Patients had given written informed consent prior to study enrollment and randomization.

Definitions and statistical considerations

The primary endpoint of the study was event-free survival (EFS events: no CR/no CR with incomplete hematological recovery (CRi) 90 days after start of therapy, relapse, or death). Secondary endpoints were CR/CRi rate, overall survival (OS, event: death), and relapse free-survival (RFS events: relapse or death). Apart from CR/CRi, patient status 90 days after start of therapy comprised persistent leukemia (≥ 5% blasts after induction therapy), early death (up to 1 week after the end of the first course of induction), death in hypoplasia (death > 1 week after end of first induction treatment in hypoplasia and < 5% blasts), or death from indeterminate cause in case of unknown presence of AML. Apart from the primary endpoint analysis, all other analyses including non-relapse-mortality (NRM, event: death in first CR/CRi) and relapse incidence (RI, event: relapse) were explorative and without adjustment for multiple testing.

CR, CRi, and relapse were defined as published previously [22]. EFS and OS were measured from start of therapy until an event was observed. RFS, NRM, and RI were defined as time from CR/CRi to observation of their corresponding events. For patients without an event, all survival endpoints were censored at the date of last follow-up.

The aim of the study was to compare the common standard arm with each study group on its own. Different group-specific arms within a study group were not considered. This was left to the study group internal analysis. Instead, the results of the common standard arm were compared with the results of the general treatment concept of each study group. Thus, no formal test of interaction was performed. Differences in baseline characteristics between the standard arm and the study group arms were investigated by Fisher’s exact test, the Wilcoxon-Mann–Whitney U test, or the Cochran-Armitage trend test [23] as appropriate. Unadjusted probabilities of OS, EFS, and RFS were calculated by the Kaplan–Meier method. To adjust for variations in baseline characteristics with prognostic influence, differences between the survival probabilities of the standard treatment arm and any of the studies’ own groups were judged in a multiple Cox regression model [24] by the Wald test with all influential co-variables included. In addition, direct adjusted survival curves based on the Cox regression model with all influential co-variables stratified for the studies were estimated [25]. Regarding the achievement of CR/CRi after induction therapy, adjustment for prognostic variables was performed through multiple logistic regression [26]. NRM and RI were calculated via cumulative incidence in a competing risk setting, the competing risk being relapse before death for NRM and death in first CR/CRi for RI. For NRM and RI, differences between treatment strategies were assessed utilizing the Fine and Gray model with all significant prognostic co-variables included [27].

With respect to the primary endpoint, the null hypotheses were that there would be no difference in the EFS probabilities when the intergroup arm was compared to study group A or to study group B. For each of the two tests, the overall significance level of 0.05 was allowed since data of study group A and study group B came from two independent studies and were not used within the same test. All p values are two-sided. Regarding the primary end point comparisons of each study with the according results of the standard treatment, the group sequential design of O’Brien-Fleming [28] with three interim analyses was applied, allowing α = 0.04291 for the final analyses. For final decisions on significance with regard to the comparisons between the standard arm and each study’s treatment strategy, p values of the adjusted multiple regression analyses for OS, EFS, RFS, CR/CRi, NRM, and RI were preferred over those of the unadjusted analyses (log-rank test for OS, EFS, and RFS; Fisher’s exact test for CR/CRi, Gray test for NRM and RI). Use of adjusted analyses was not pre-specified in the protocol for both the primary and secondary outcome measures, but deemed preferable due to differences in prognostic factors. However, use of unadjusted analyses did not lead to changes in significance. All analyses were performed with the SAS software version 9.4 (SAS Institute, Cary, NC), all graphical outputs were created using R version 4.1.0 (R Core Team 2013).

Results

Between April 1, 2005, and May 26, 2015, 1286 patients were randomly assigned to the CSA (n = 132) or to the study groups arms (n = 1154; Fig. 1). After excluding 139 patients (10.8%) due to in- and exclusion criteria violation, 1147 patients were eligible for analysis (114 of them (9.9%) assigned to the CSA). A total of 1120 patients had follow-up for OS and 1079 patients were available for CR analysis (Fig. 1). Baseline characteristics of all eligible patients showed median ages of 68 (range 60–82) years for the CSA, 70 (60–85) years for study group A, and 69 (60–87) years for the study group B (Table 1). The CSA had a significantly different molecular marker distribution compared with study A (p = 0.04), but not with study B. No significantly different distributions were found with respect to the proportions of patients with secondary AML, cytogenetic risk groups, white blood cell counts, and LDH.

Fig. 1
figure 1

Consort flow diagram. Allocation of AML patients to the arms, eligibility, CR and overall survival analyses

Table 1 Patient characteristics according to the allocation to common standard arm (CSA), study group A and B

Outcome

After 90 days of therapy, 54.0% (95% CI: 45–64) of the patients in the CSA had achieved CR or CRi, which barely differed from the results of the study groups’ own regimens (study group A 53% (95% CI: 47–60) and study group B 59% (95% CI: 56–63); Table 2). Adjusting the comparisons CSA vs. group A and CSA vs. group B by including the significant prognostic variables cytogenetic/molecular risk group, type of disease at diagnosis, WBC, and age in a common logistic regression model, no significant differences between the CR/CRi rates were identified. Overall death rate at 90 days was not significantly different between the CSA (24%) and each of the study groups independently (27% study group A and 19% study group B, Table 2). Persistent leukemia at day 90 was noted in 16% of the standard arm as compared to 12% and 17% in the two study group arms, respectively.

Table 2 Clinical course of patients after treatment in the common standard arm (CSA), arm A and B

The probabilities for EFS between the CSA and the two study group regimens (primary endpoint) did not differ significantly (Table 2, Fig. 2). Five-year EFS was 6.2% (95% CI: 2.7 – 14.0) in the CSA, 7.6% (95% CI: 4.5 – 12.8) in study A, and 11.1% (95% CI: 9.0 – 13.7) in study B. In the multivariate analysis age, type of disease, cytogenetic group, and WBC count at diagnosis were independent prognostic factors, but treatment group was not (Table 3).

Fig. 2
figure 2

Event-free survival (EFS) of the three arms: common standard arm (CSA), study group A, and study group B

Table 3 Multi-variable Cox-PH regression to identify variables with influence on EFS, OS, and RFS

Median observation time was 67 months. OS did not differ significantly between the CSA and the study groups’ own regimens (Fig. 3). The 5-year survival probability was 17.2% (95% CI: 11.0–26.9) in the CSA, 17.0% (95% CI: 12.0–23.9) in the study group A, and 19.5% (95% CI: 16.7–22.8) in the study group B. Study group affiliation was not significant for OS, in contrast to age, type of disease, cytogenetic risk group, and WBC at diagnosis (all p < 0.0001; Table 3).

Fig. 3
figure 3

Overall survival (OS) of the three arms: common standard arm (CSA), study group A, and study group B

The 5-year RFS probability was 13.8% (95% CI: 7.3 – 25.9) in the CSA, 14.6% (95% CI: 9.2 – 23.1) in arm A, and 20.6% (95% CI: 17.1 – 24.8) in arm B without significant differences (Fig. 4). In the Cox model only age and cytogenetic risk were statistically significant for RFS, treatment group was not (Table 3).

Fig. 4
figure 4

Relapse-free survival (RFS) of the three arms: common standard arm (CSA), study group A, and study group B

To adjust survival probabilities of the treatment groups by the significant covariates identified in the respective Cox model, adjusted EFS, OS, and RFS probabilities were computed (suppl. Figure S2, S3 and S4). NRM and RI were estimated, but no statistically significant differences between the treatment groups were observed (Fig. 5). At 5 years, RI amounted to 74.9% (95% CI: 61.9 – 84.1) in the CSA, 65% (95% CI: 55.3 – 73.1) in study A, and 61.0% (95% CI: 56.4 – 65.3) in study B. NRM was calculated for the same patient collective, revealing 5-year NRM rates of 11.3 (95% CI: 5.2 – 20.0) in the CSA, 20.4 (95% CI: 13.7 – 28.0) in arm A, and 18.4 (95% CI: 15.0 – 22.0) in arm B.

Fig. 5
figure 5

Non-relapse mortality (NRM) and relapse incidence (RI) of the three arms: common standard arm (CSA), study group A, and study group B

Discussion

The most widely utilized intensive induction chemotherapy for AML was first published in 1973 [30] and, after further refinements in the 80 s, has been used in the current form ever since [19, 31]. Over this period, a number of clinical trials have investigated induction intensity following dose dependent efficacy concepts, new drug combinations, and sequential therapies. Somewhat surprisingly, a prospective intergroup analysis in younger (< 60 years) patients with AML compared protocols of differing intensities from five German study groups against a CSA and did not find any statistical significant difference in outcomes [29].

Until two decades ago, it was generally accepted that elderly patients with AML should not be treated intensively because of adverse biology, comorbidities, and dismal survival. This attitude changed after long-term survival was observed in a small proportion of elderly patients after IC and results improved to 19.5% OS at 5 years. In the current randomized inter-group trial, we aimed to focus on induction intensity in patients ≥ 60 years of age by analyzing treatment concepts of different intensities within two German study groups compared to the standard 3 + 7 protocol (CSA) [16]. Comparison of the treatment strategies did not show clinically relevant outcome differences when compared to the CSA in CR rate, EFS, OS, and RFS. The study groups had lower RI, but these differences were not statistically significant and counteracted by a numerical higher NRM, again with no significant difference. Risk factors for EFS and OS identified in the patients included age, type of disease, cytogenetic risk group, and WBC counts at diagnosis, but not treatment strategy.

The results described in this study are of importance for several reasons. First, efficacy results showed no significant difference between either intensified induction and the established 3 + 7 protocol. This protocol continues to be the reference for further studies exploring combinations with targeted therapies. Second, the results of this multi-center intergroup study suggest improved EFS (6.2%, 7.6%, and 11.1%) and OS probabilities (17.0%, 17.2%, and 19.5%) at 5-years in patients with AML ≥ 60 years as compared with historical controls (OS 8% at 5 years) [6]. This may result from better supportive therapy and standardized clinical management. Third, this trial confirms the feasibility of IC in elderly patients up to 87 years. Age itself influenced EFS and OS, as did cytogenetic risk group and WBC at diagnosis. Finally, AML persistence rates after ≤ 2 induction cycles were higher in this population (14.7%) than in younger patients (7.6%) [9] and the death rate in the first 90 days with and without leukemia was 20.9%. In addition, NRM and RI were not statistically significant different between the treatment groups.

The randomized design with broad inclusion criteria and a large number of patients is a particular strength of this multi-center study and provides real world information. Our estimation of patients not included in this study is in the range of 20%, which is a clear improvement on previous figures of 35–50% [7, 8]. Potential weaknesses include the small sub-groups of very high risk AML patients (e.g., TP53 mutated), which may show differential response to different treatment strategies. A further limitation of this analysis is the restriction to prognosis based on cytogenetic risk and to FLT3-ITD and NPM1 mutations only, since additional molecular features at diagnosis are unavailable. Furthermore, it is not possible to evaluate the effect of HSCT due to the fact that only a small proportion received this treatment. Randomized studies of the role of HSCT are currently under evaluation [32].

New concepts are needed to further improve the results for elderly patients with AML. Obtaining higher CR rates and increasing the depth of CR might be one way to reach this goal. The use of new delivery formulations such as the liposomal formulation (e.g., CPX-351) may be one way of increasing the efficacy of induction chemotherapy [33, 34].

The importance of epigenetic changes in the initiation of AML has been discovered during the last decade and hypomethylating agents (HMA) are increasingly used in patients not eligible for IC and in elderly patients [10, 35]. Although not curative, HMA are able to induce CRs even in pretreated patients and patients with comorbidities and display lower toxicity than IC [10]. This has prompted the practice of response adapted sequential therapy in elderly patients with AML using HMA initially and then IC in non-responding patients [36]. The results of this study are currently awaited. Further discoveries in the biology of AML are opening new frontiers. In addition to the role played by epigenetic changes, disturbance in the regulation of apoptosis involving bcl-2 have been identified as important common mechanism in AML. The concept of blocking bcl-2 has been tested successfully in refractory disease as monotherapy and in combination with epigenetic therapy in newly diagnosed patients with AML [11]. These treatments lead to CR rates similar to those of IC with a high proportion of measurable residual disease negative patients and low therapy-related mortality [12, 13]. Randomized studies will show if IC can be replaced either by this combination or even by triple induction therapies to induce CR in patients with AML. Results to date suggest that some responses may be short lived and that development of resistance is the limiting factor for long-term remission. Improvements in consolidation therapy and/or maintenance therapy may be one solution for avoiding relapse caused by resistance to these drugs or drug combinations.

Inhibition of activating driver mutations increases the treatment options in AML. Inhibitors of FLT3 mutations, which are now available with various specificity and potency characteristics, have been studied in the context of both mono- and in combination therapy. The addition of TKI to chemotherapy has been shown to increase overall survival and has been approved for newly diagnosed patients [14]. The potential of second generation TKI to induce CR as a low toxicity monotherapy has been tested in relapsed and refractory patients [37, 38]. Second generation FLT3 inhibitors are currently tested in combination to IC and are expected to further improve results in newly diagnosed patients. While clearly being of high interest, this approach is restricted to the 1/3 of all AML patients who have FLT3 mutated disease and often results in resistance or relapse that limit long term remissions. Other targeted therapies such as IDH inhibitors have shown promising results as mono- or combination therapies in phase I and II studies [15,16,17,18].

Meanwhile, the determination of measurable residual disease is enabling quantification and monitoring of the depth of response either by molecular or flow cytometry determination methods. This will allow better evaluation of CR and management of personalized therapy. Reducing treatment related mortality may be another approach to improve outcome of elderly patients with AML.

Finally, the use of HSCT following reduced or non-myeloablative conditioning to decrease the relapse risk is another promising approach. Such protocols have been successfully established in patients up to 75 years and older [39,40,41]. Other consolidation or maintenance therapies including immunological concepts to eradicate the malignant stem cell or clones are being investigated.

In conclusion, more intensive treatment strategies did not show clinically relevant outcome differences when compared to CSA, but an overall long-term improvement compared to previous publications in patients ≥ 60 years with newly diagnosed AML. Intensive chemotherapy remains the backbone for long term survival. The outcome of this clinical trial provides an important contribution for the selection of IC to be used in combination with targeted or new treatment modalities in future studies involving treatment naive AML patients. In addition, it proves that an innovative trial design, like in our study, may help answering important clinical questions without hampering study group specific questions.