I’m a bit troubled by the judgment of the Court of Appeal in Servier Laboratories v NICE, in which the Court has granted Servier’s appeal, quashing the National Institute for Clinical Excellence’s decision not to recommend Servier’s drug Protelos (strontium ranelate) for use within the NHS for the prevention of fractures in patients with osteoporosis.
Protelos has been licensed for use throughout the EU by the European Medicines Agency, which considered the drug in 2001 (some drugs are considered by the EMA at EU level, bypassing national authorisation by our own MHRA). As part of its assessment, the EMA asked for some evidence that Protelos prevents hip fractures in particular, and suggested Servier identify among the patients who’d taken part in its clinical trial a subgroup of women who would be at an enhanced risk of suffering a hip fracture. Servier did so, and analysis of those figures seem to show, within the subgroup, that women on the drug had sustained 36% fewer hip fractures than women on placebo. That degree of efficacy compared closely with that of of aledronate – NICE’s recommended drug. On that basis, the EMA was content to authorise the use of Protelos in Europe.
Why the EMA asked for that analysis, I don’t know. It’s important to note though that the subgroup analysis is not itself randomised clinical trial data. It’s a selection of data made after the fact, and so inherently vulnerable to the biases that randomisation in clinical trials is designed to prevent.
When NICE looked at the drug, though, it was less happy with this evidence. Its guidance to drug companies says
There should be a clear clinical justification and, where appropriate, biological plausibility for the definition of the patient subgroup and the expectation of a differential impact. Ad hoc data mining in search of significant subgroup effects should be avoided.
When NICE asked experts at Sheffield University to evaluate Protelos, they weren’t impressed with the subgroup analysis, which they thought essentially unreliable, because comparisons were not being made between randomly selected groups. Servier disagreed, and make submissions reminding NICE that the EMA had been satisfied with the robustness of the data, and arguing that the subgroups were effectively “randomised”, since they had been well balanced for baseline characteristics. I don’t myself see how judicious post-hoc selection can ever be equated to genuine advance randomisation, but perhaps that’s a result of my scientific ignorance rather than the Court of Appeal’s. In any case, NICE’s Appraisal Committee decided against Servier, whereupon Servier appealed; and on appeal, NICE’s Appeal Panel decided it had been right to reject the evidence of the subgroup analysis.
Whereupon, Servier went to the courts, arguing on judicial review that NICE’s decision was not properly reasoned, and irrational. Again at first instance they lost on this point. But now the Court of Appeal has finally found in Servier’s favour.
The Court’s reasoning is that NICE in the reasoning it gave for its decision never expressly adopted the Sheffield experts’ view that the subgroup analysis was not randomised. Nor did it expressly give reasons for rejecting Servier’s arguments about randomisation. So far, I follow. Perhaps – just perhaps – it’s right to quash a decision like this on the sole basis that it could and should have been more fully reasoned. I’m not sure in this particular case that it would be right, given that it must have been obvious to Servier at every point in the procedure why its data was not accepted. Indeed, the submissions it actually made indicate clearly that it was well aware NICE shared the Sheffield view that the subgroup analysis was unreliable because not truly randomised. I am genuinely surprised that Lady Justice Smith says she doesn’t know why NICE reached the decision it did.
But the judges go further than this – and that’s what causes me concern. Lady Justice Smith says (para. 40) she has grave doubts about the rationality of NICE’s decision – not merely about the adequacy of the reasons NICE gave for it. She follows Servier in placing great emphasis on the fact that the data was good enough for the European Medicines Agency, and goes on (para. 52)
By making these references to EMA I am not, of course, suggesting that NICE is bound by a prior decision of that body. However, I would expect to see some reason given for NICE reaching a different view from a body of similar standing.
Lord Justice Wilson agrees (para. 62):
It is not suggested that NICE are bound by EMA’s decision or its reasoning but the appellants are entitled to expect any decision against them to be properly reasoned, especially when it is contrary to the reasoned decision of an equally eminent body.
But this surely goes too far, and does indeed bind NICE to follow the EMA at least in the sense that it must do so unless it provides good reasons why not. Servier certainly present it that way in their press release.
Both judges seem mesmerised by the fact that the EMA asked for this data. But firstly, it’s not clear that should matter since the EMA’s decision was of a different kind – whether Protelos could be used at all in the EU. Accepting data as supporting mere authorisation (when the alternative would be to deny Protelos access to the EU medicines market at all for prevention of hip fractures) is not the same, it seems to me, as accepting the same data to support Servier’s case that Protelos should be funded by the NHS rather than some other drug – especially when even the Court of Appeal recognised that the contentious data does not show Protelos is more effective than NICE’s preferred treatment. It’s far from clear that it would have made any real difference to its recommendations even if NICE had accepted the subgroup analysis – but the judges don’t seem to have thought much about that. They simply accepted Servier’s contention that the point is not academic and the outcome might somehow be favourable to its drug.
Secondly, though, neither judge seems to consider the possibility that the EMA might have got it wrong by asking for subgroup data precisely “in search of significant subgroup effects”, as NICE’s guidance cautions against. Why should NICE have to justify its departure from the EMA’s approach, unless the EMA is equally required now to justify its approach in the light of NICE’s? Repeatedly the judges suggest it would be “surprising” if the EMA had accepted flawed results. But that’s merely an assumption based on a perception of the EMA’s authority. Nowhere do the judges quote the EMA as giving adequate reasoning – of anything approaching the exhaustiveness the judges are requiring from NICE – to justify its decision to accept the data.
For NICE to have to go through this decision again, being in effect barred, against its better scientific judgment, from rejecting the subgroup analysis on principle because selected post hoc, is bad enough. It looks very much as though the judges have been persuaded by Servier’s arguments about the data, and have preferred their own scientific judgment about its relevance to that of the actual experts.
Yet worse is the suggestion that NICE’s decisions may in future be legally reviewable in terms of their agreement or otherwise with the uncritically accepted approach of some other body, set up for another purpose.
Carl,
Do you think this one might be appealed to the Lords?
I doubt it, to be honest. If the case had been decided simply on the basis that good reasons were needed to depart from the EMA’s approach, then perhaps that would have raised a point of law of general public importance (which is what’s needed for an appeal to the Supreme Court). But the Court of Appeal has put most of the emphasis on what it says is defective reasoning in this one decision, and it’s clear the appeal was allowed on that point alone – it’s difficult to see how that can be made into a legal point of general importance, I’m afraid. Unfortunately though I think Smith LJ’s other obiter points will influence the thinking of lawyers in the pharmaceutical industry and those who advise NICE. It’d be a bold lawyer who advised them, after this, that they could safely refuse to accept data accepted by the EMA or MHRA, without giving good reasons for that decision.
Isn’t this decision in some ways at odds with the kind of approach the Court of Appeal just outlined in the BCA v Singh libel case appeal on meaning, where they were saying that the courts shouldn’t be the place to resolve disputes about scientific evidence?
And as you say, the underlying parameters would seem rather different for a decision in Europe (where presumably they were only assessing “Is the drug safe and effective”) and in the UK where NICE explicitly assess “Is it effective and value for money”. On that basis the decision seems rather perverse.
.-= Dr Aust´s last blog ..Stop Press: Simon Singh wins Appeal Court ruling on meaning =-.
This judgement seems to stem from a failure to understand how science, and
in particular NICE, works. NICE does not go around recommending drugs
willy-nilly, simply because they haven’t been proved not to work.
In any case, it is impossible to prove drugs don’t work (except,
perhaps, to the extent that they work in fewer than one in a thousand, or
some other proportion, of patients).
Any scientific experiment starts from a null hypothesis, rather like the presumption of innocence in a criminal trial. Testing a drug must start from the presumption that it does not work. NICE didn’t choose between a valid but statistically insignificant study and a statistically significant but unsound one; neither proved the drug worked. Therefore NICE should not recommend it.
Thanks Carl for a thought-provoking, informative analysis
My comments reflect my background in medicine (and ignorance of law), and a quick scanning of the judgement and NICE’s guideline — google could not find me the relevant NICE/ScHARR TAR and EMEA guidelines within my limits of patience.
It seems to me that NICE/ScHARR made the correct decision, but defended it badly.
In general, results from subgroup analyses should be regarded as being as high risk of bias. To explain this by saying that subgroups are inadequately randomized (as the judgement implies) amounts to little more than hand waving. One would expect the problems with subgroup analyses to be explained in a bit more detail.
Results from single trials and from subgroup analyses from only one trial are less reliable than results from multiple independent trials.
Therefore on general principles, one would assess the data on strontium ranelate as less reliable than the data on alendronate.
The judges were correct to emphasize the significance of EMEA requesting a subgroup analysis for a known high risk group, because, if this request was independent of the manufacturer, it would suggest that the manufacturer was not data-mining, and presenting only the “convenient” results. But, one has to wonder, why did the manufacturer only report results for hip fractures in the higher risk group? Similar results would have been expected for all non-vertebral fractures combined and for vertebral fractures. If these results were not consistent, they would be less likely to be published, because they would suggest that the hip fracture results were unreliable.
One has to wonder (and I will check when I have the time) if a similar subgroup analysis was performed for alendronate. If it alendronate is more effective in the higher risk group than the lower risk group, it would lend indirect support to other drugs also being more effective in the higher risk group.
I did not understand NICE’s point that “From all discussions amongst experts that I have been part of, including discussion of the Committee and the GDG, there is no biological plausibility that any of the drugs under appraisal should be more efficacious in older women (in this case women over 74) than in younger women.”
I am not an expert in osteoporosis, but it is a generally accepted principle that people at higher risk are likely to benefit more from preventative treatment than people at lower risk. And, in fact, the NICE guidance on osteoporosis follows a risk-stratification based on age.
Interestingly enough, we at the EMA now generally try and avoid subgroup analyses. We are well aware that it’s not best practice and have internal trainings with the relevant staff to make them aware that an organisation presenting a subgroup analysis should have defined the subgroup before the trial started (or, at worst, during the trial’s conduct and before the final analysis) and that subgroups have to show pretty fantastic results to justify accepting them.
I’m surprised that the judge ruled in Servier’s favour on what is nothing more than an insubstantial point and surprised that Servier have brought the case. I can only presume that Servier believe that they will get sufficient sales in the period between this ruling and NICE reissuing their decision with a couple of additional paragraphs to justify the costs of the legal action.
Michael Power – I suppose that NICE’s point that “From all discussions amongst experts that I have been part of, including discussion of the Committee and the GDG, there is no biological plausibility that any of the drugs under appraisal should be more efficacious in older women (in this case women over 74) than in younger women.” means that, although the risk of fractures is greater, there is no reason (say, increased numbers of the relevant receptors on relevant cells of women once they hit 74) why a drug would work better in those over 74 than under 74.
Although you’re at greater risk of osteoporosis when you’re older, all that means is that a drug which prevents 10% of fractures will prevent 10% of more fractures in the older women than in the younger women, therefore it will prevent more fractures, but that doesn’t mean it’s more efficacious, just more useful to a society which has to pay for each fracture.
Does that make sense (I’m not trying to be patronising, I’m just unsure if what I’ve written makes sense).
Thanks Tom
“Although you’re at greater risk of osteoporosis when you’re older, all that means is that a drug which prevents 10% of fractures will prevent 10% of more fractures in the older women than in the younger women, therefore it will prevent more fractures, but that doesn’t mean it’s more efficacious, just more useful to a society which has to pay for each fracture.”
Your explanation makes perfect sense. I have done a little more digging and found that what NICE meant by “no biological plausibility that any of the drugs under appraisal should be more efficacious in older women (in this case women over 74) than in younger women.” is that the relative risk reduction should be assumed to be the same in different age groups. The point I was making is that the absolute risk reduction is greater in groups at higher risk.
I have reread the Court of Appeal’s judgement, but the summary is at too high a level for me to understand exactly what Servier were apealling against. But, I found the Sheffield Health Technology Report and it seems the basic problem they had with the Servier evidence is not so much that the subgroup of elderly women was not properly randomized, but that the published report does not contain enough detail to assess if the randomization was adequate, and perhaps worse, the third paper combined data from two trials without publishing enough details to check if this combination was appropriate.
An excellent thought provoking and informative post Carl.