Deciphering trial outcomes can be a tricky business. As if many measures aren't hard enough to make sense of on their own, they are often combined in a complex maneuver called a composite endpoint (CEP) or composite outcome. The composite is treated as a single outcome. And journalists often phrase these outcomes in ways that give the impression that each of the separate components has improved.
Here's an example from the New York Times, reporting on the results of a major trial from the last American Heart Association conference:
"There were 6.4% fewer cardiac events - heart disease deaths, heart attacks, strokes, bypass surgeries, stent insertions and hospitalization for severe chest pain..."That individual statement sounds like the drug reduced deaths, bypasses, stents, and hospitalization for unstable angina, doesn't it? But it didn't. The modest effect was on non-fatal heart attacks and stroke only.*
CEPs are increasingly common: by 2007, well over a third of cardiovascular trials were using them. CEPs are a clinical trial shortcut because you need fewer people and less time to hit a jackpot. A trial's main pile of chips is riding on its pre-specified primary outcome: the one that answers the trial's central, most important question.
The primary outcome determines the size and length of the trial, too. For example, if the most important outcome for a chronic disease treatment is to increase the length of people's lives, you would need a lot of people to get enough events to count (the event in this case would be death). And it would take years to get enough of those events to see if there's anything other than a dramatic, sudden difference.
But if you combine it with one or more other outcomes - like non-fatal heart attacks and strokes - you'll get enough events much more quickly. Put in lots, and you're really hedging your bets.
It's a very valuable statistical technique - but it can go haywire. Say you have 3 very serious outcomes that happen about as often as each other - but then you add another component that is less serious and much more common. The number of less serious events can swamp the others. Everything could even be riding on only one less serious component. But the CEP has a very impressive name - like "serious cardiac events." Appearances can be deceptive.
Enough data on the nature of the events in a CEP should be clearly reported so that this is obvious, but it often isn't. And even if the component events are reported deep in the study's detail, don't be surprised if it's not pointed out in the abstract, press release, and publicity!
There are several different ways a composite can be constructed, including use of techniques like weighting that need to be transparent. Because it's combining events, there has to be a way of dealing with what happens when more than one event happens to one person - and that's not always done the same way. The definitions might make it obvious, the most serious event might count first according to a hierarchy, or the one that happened to a person first might be counted. But exactly what's happening often won't be clear - maybe even most of the time.
There's agreement on some things you should look out for (see for example Montori, Hilden, and Rauch). Are each of the components as serious as each other and/or likely to increase (or decrease) together in much the same way? If one's getting worse and one's getting better, this isn't really measuring one impact.
The biggest worry, though, is when researchers play the slot machine in my cartoon (what we call the pokies, "Downunder"). I've stressed the dangers of hunting over and over for a statistical association (here and here). The analysis by Lim and colleagues found some suggestion that component outcomes are sometimes selected to rig the outcome. If it wasn't the pre-specified primary outcome, and it wasn't specified in the original entry for it in a trials register, that's a worry. Then it wasn't really a tested hypothesis - it's a new hypothesis.
Composite endpoints, properly constructed, reported, and interpreted are essential to getting us decent answers to many questions about treatments. Combining death with serious non-fatal events makes it clear when there's a drop in an outcome largely because people died before that could happen, for example. But you have to be very careful once so much is compacted into one little data blob.
(Check out slide 14 to see the forest plot of results for the individual components the journalist was reporting on. Forest plots are explained here at Statistically Funny.)
More on understanding clinical trial outcomes:
- Another way to get clinical trial results quickly: surrogates and biomarkers (at Statistically Funny)
- Keeping risks in perspective (at Absolutely Maybe)
New this week: I'm delighted to now have a third blog, one for physicians with the wonderful team at MedPage Today. It's called Third Opinion.