PsyDactic

The STAR*D Trial: Scientifically Flawed or Scientific Fraud?

December 10, 2023 T. Ryan O'Leary Episode 44
PsyDactic
The STAR*D Trial: Scientifically Flawed or Scientific Fraud?
Show Notes Transcript

The authors of the famous sequenced treatment alternatives to relieve depression trial or STAR*D reported that about two-thirds or 67% of patients had achieved remission after 4 trials of antidepressant medication.   This remission rate has been questioned over the years and in October of 2023, the journal BMJ Open published an article that reports to have reanalyzed the date from STAR*D using the original study design.  This re-analysis found much much lower rates of remission.  It reported the cumulative remission rate as only 35 percent.  How do two different sets of researchers using the same data set get a result that is just half of what the other researchers reported?  That is the mystery that I report on today.

Please leave feedback at https://www.psydactic.com.

References and readings (when available) are posted at the end of each episode transcript, located at psydactic.buzzsprout.com. All opinions expressed in this podcast are exclusively those of the person speaking and should not be confused with the opinions of anyone else. We reserve the right to be wrong. Nothing in this podcast should be treated as individual medical advice.

Welcome to PsyDactic - Residency Edition.  I am Dr. O’Leary, a 4th year psychiatry resident in the national capital region.  This is a podcast about psychiatry and neuroscience that I produce to help me understand what I do better.  Because this podcast is my own venture, I have to warn you that everything I say here is my own opinion, complete with my own biases, misinterpretations, and probably most importantly missing data.  If I wanted to, I could just cherry pick some facts or studies that I want you to know about, and pretend like these represent the whole picture.  I don’t have any fact-checkers or editorial staff to reign me in.  I try to be as complete as fair as I can, but in the end, I get to be the judge of that, so listeners beware.  I also do not speak for the Federal Government, the Department of Defense, the Defense Health Agency, any Drug Company, or the California Dried Plum Board, though diarrhea may or may not be a side effect of listening to this podcast.  Today I am going to talk about what might be the most famous psychiatric study of this century, the sequenced treatment alternatives to relieve depression study or STAR*D for short.

STAR*D is a must know study among psychiatrists and primary care doctors.  It is even included in the book, 50 studies every psychiatrist should know. This mash-up of an acronym and initialism stands began in 2001 and the main results of each leg of the study were published in a series of papers in 2006 along with a summary paper in the American Journal of Psychiatry that same year.

STAR*D was an open-label, multi-centered trial that analyzed the effectiveness of various antidepressant strategies in adult patients who had presented to one of 41 clinics and who had screened positive for depression.  This was designed to be a real-world, intention-to-treat type analysis. It was large and expensive, including over 4 thousand adults.  The study was designed with a stepwise treatment strategy that started with a selective serotonin reuptake inhibitor, citalopram, and if this did not result in remission, remaining patients were randomized to different antidepressants or an antidepressant with augmentation in up to 3 more steps.  In the original design, a patient who had failed the first 3 trials would be exposed to one last drug trial.

It was important to do this because other trials estimating the effectiveness or efficacy of antidepressants tend to include patients that screen positive for depression alone and do not have other comorbid psychiatric conditions.  By removing patients with multiple conditions from drug trials, the idea is that the effectiveness of the drug for the target illness can be estimated in a kind of best case, noise-free scenario.  Trials also tend to recruit their participants actively, but his one included people who consented to the study after reporting depressive symptoms at a clinic.  The real world is frequently messy and uncontrolled, and so STAR*D was meant to figure out if current antidepressant strategies were actually effective in the real world.  

The authors of STAR*D reported in a summary paper that cumulatively about two-thirds or 67% of patients had achieved remission after 4 trials of medication.  In the first step, they reported that 37% of patients remitted.  Of the remaining patients, about 30% remitted in the second step, and if you combine the 3rd and 4th step, about 30% more remitted.  Altogether, in this real-world scenario, it appeared that about two-thirds of people would benefit from some kind of pharmacotherapy for depression, if they were actually willing to stay in therapy and try 4 different strategies for up to a year before achieving remission.  That sounds very effective on the surface.  It sounds like for every 3 patients we treat with antidepressants, 2 of them will get better if they stick with it.  At least, this is how the study has been reported.

This remission rate has been questioned over the years and in October of 2023, the journal BMJ Open published an article that reports to have reanalyzed the date from STAR*D using the original study design.  This re-analysis found much much lower rates of remission.  It reported the cumulative remission rate as only 35 percent.  How do two different sets of researchers using the same data set get a result that is just half of what the other researchers reported?  Well… that is the mystery that I report on today.

The primary author, Ed Pigott (I am not sure how to pronounce his name.  It could be PigAHT or PIGuht, but I prefer to pronounce it like Waiting for Gidot, so I will say PiGOH)... anyway Dr. Pigott has a PhD in psychology and has published and blogged on this topic before.  His previous papers point out the various ways that STAR*D authors appear to have deviated from their primary research design in a way that vastly inflated the efficacy of the treatments.  He has authored other articles that are critical of the way that the literature on SSRIs has been biased by various factors including publication bias, various kinds of p-hacking (which could be an episode all on its own), and intentionally obscuring results.

These sorts of criticisms are not unique to Pigott.  There is a reckoning in psychology and medical science right now regarding a century of biased and at times apparently fraudulent results being published and accepted as dogma.  This has resulted in far more oversight of research, including the FDA and many journals requiring that studies publish their methods before conducting the study or even hiring outside firms to administer the research, which can help prevent data manipulation, among other things.  Those funding studies may require that all results be published, even if they are not friendly to the drug, device or treatment being studied.

There is now a broad, privately funded initiative that encourages and supports authors gaining access to data from trials that were either not published or that reported results that other researchers believe are not a valid or an accurate summary of the data.  It is called the RIAT (pronounced riot) initiative and you can find their website at https://restoringtrials.org/.  They report that at least until Aug 2022, 100% of their funding came from a philanthropic organization called Arnold Ventures.

Today’s episode is in part about this larger issue.  The kinds of changes that were made to the protocol of the STAR*D can and should be analyzed in the context of research ethics and questionable practices.  I also want to mention before I go any further, that I am not making accusations of fraud.  Pigott personally stated in 2010 in a blog post on the site Mad in America, quote, “[T]his is a story of scientific fraud, with this fraud funded by the National Institute of Mental Health at a cost of $35 million.” Unquote.  I very much doubt that the authors were intentionally defrauding anyone, though it is common for researchers to introduce bias by making decisions or assumptions that they feel are reasonable but still result in inflating the significance of their work.  Ad hominem attacks of fraud do raise red flags, but they do not directly address the questions of the appropriateness of the changes that were made to STAR*D’s initial protocol, and the validity of the reported results given these changes.

Pigott et al had four main criticisms of the STAR*D summary report.

First, they point out that the STAR*D investigators changed their primary outcome measure from the original protocol.  Instead of using data from blinded assessors using the Hamilton Depression Rating Scale (aka HAM-D or HDSR), the researches instead converted data from an unblinded, clinician administered scale call the Quick Inventory of Depression Symptoms, or QIDS, into HAM-D equivalence.   This was an important change for two reasons.  The first is that the HAM-D results were gathered by a blinded assessor over the phone, while the QIDS results were gathered by someone who was not blinded to the treatment.  We know that unblinding can have a dramatic effect on reporting or recording of results, so this is concerning.  It was reported that the primary reason for this change was the very high dropout rate and high rate of failure gathering data in a blinded fashion.  Data was more reliable gathered in the clinical setting.  Also, the primary research protocol had stipulated that failure to report HAM-D results would count as failure to remit during the analysis, so all that missing data would have potentially biased the results toward a lack of response or remission.  The opposite is true when including unblinded data, which can bias the results toward remission.  By making this change, the researchers were likely inflating remission rates.

The second criticism that deviated from the initial study protocol was the inclusion of patients who did not even meet criteria for a depressive episode on their initial HAM-D results.  There were over 900 patients who had a HAM-D score of 14 or less, which was the initial cutoff.  Of these, there were about 100 patients who scored less than 8, which would qualify them to have remitted even before the study started.  Including these patients could potentially have made it look as if 100 more patients had remitted even though they started below the remission cutoff.  Patients with scores below 14 could be biased toward remission because they already have a sub-threshold symptom burden to begin with.  It is also possible that some of these patients got worse and no longer qualified as remitted, but the study was not designed to see how people without depression responded to treatment.

Changes in who was excluded or not excluded from analysis did not stop there.  The third major criticism of the 2006 analysis was that they did not include all the drop-outs.  370 of the participants dropped out after starting citalopram in the first phase of the study, but were not included in the final analysis the way the protocol had planned.  Participants who dropped out were supposed to be counted as non-remitters, but they were not.

The final major note of contention was that the study was supposed to only bring forward non-remitters into the next step of treatment, but that did not happen with fidelity.  125 participants were moved forward into the next level of treatment and included in the analysis despite meeting criteria for remission prior to starting that level.

It appears that the results reported by STAR*D authors were the most favorable results possible, given the data, and likely inflated the real world effectiveness of the treatment strategies employed in the study.  The results of the re-analysis calculated a cumulative remission rate of only 35-41%, depending on whether to substitute some of the QIDS scores for missing HAM-D scores, versus the 67% reported in the original study.

With regard to the study being realistic, Pigott et al also highlight that patients entering the study were treated for free with the highest standard of care available at the time.  Is this the way care is given in the real world?  Looking at the study from Pigott’s perspective makes it seem like the original data actually suggests that our treatment strategy for depression in the real world is at its best barely effective, and possibly not any more effective than placebo.  Even though this study was not a placebo controlled trial, response rates of around one third are commonly reported for placebo treatments.

It would not be fair to mic drop at this point.  The authors of STAR*D deserve a chance to respond.  Pigott had published similar criticisms in 2010 and 2015, but had not reanalyzed the entire data set previously.  Rush et al, the authors of the STAR*D trial, had not responded to these previous publications, at least that I can find.  They did respond to the October 2023 reanalysis with a letter to the American Journal of Psychiatry published this December. 

Rush et al’s response criticize Pigott et al’s removal of 941 patients included in the total analysis in the original STAR*D summary paper.  They propose that a truly real world view of effectiveness of the trials requires a more inclusive approach.  In the papers that preceded the summary of STAR*D and treated each step separately, the authors note that they had been harshly criticized for coding patients with missing HAM-D scores as non-remitters, because this potentially resulted in too low an estimate of remission.  People without HAM-D scores could have remitted or not, so it is very conservative to score them as not remitted.  The HAM-D was scored at the beginning and end of each step of treatment, which means that there would be no mid-treatment result to track progression.  The use of the QIDS score, which was administered at every follow-up, was reasonable to use as a proxy in the event that a patent did not return for follow-up.  They may have not returned because they felt so much better.  Finally, Rush et al claim that Pigott et al mischaracterize their paper entirely.  I will let their words speak for themselves.  Quote.

“What Pigott and colleagues fail to appreciate is that the overall outcomes of patients across 1 year of treatment reported by Rush et al. was not an “a priori”-identified analysis in the protocol but a secondary “post-hoc” report, specifically requested by the Editor-in-Chief of the American Journal of Psychiatry at that time, with the goal of summarizing the clinical outcomes—as measured by the self-reported QIDS-SR (capturing the symptom status of each patient at the last visit regardless of level and regardless of whether or not the HRSD was obtained
at study exit)—of this complicated multilevel trial. As such, the use of different methods and alternate measures in secondary analyses is a well-accepted scientific approach to explore the data and develop new hypotheses for future research.”

Sometimes I find myself agreeing with everything someone says, but still disagreeing with the fundamental conclusions they come to, and I think that this is one of those cases.  I do feel that Pigott et al sensationalized the short-comings of the STAR*D trial and took a worst case scenario approach in their reanalysis.  However, I also feel like the STAR*D trial has not been read as, QUOTE, “a scientific approach to explore the data and develop new hypotheses for future research.” UNQUOTE.  It is taught as the foundational confirmation of the effectiveness of the contemporary approach to pharmacotherapy for depression.

The weight of the evidence appears to be that STAR*D is, in fact, not all that it is cracked up to be, and the authors of the paper have not been vocal enough in tempering the huge wave of misinterpretation of their results.  It does not appear to be a blatant example of scientific fraud.  It is scientifically flawed, but not scientific fraud.  Pigott in his 2010 blog may have been suffering from a hefty dose of moral outrage that influenced his choice of words.  His own re-analysis attempts to swing the pendulum as far back as it can go, but in doing this, it seems both studies, the original and its reanalysis, probably swing past the truth.

Thank you for listening.  I am Dr. O, and this has been an episode of PsyDactic, Residency Edition.

Pigott HE, Kim T, Xu C, Kirsch I, Amsterdam J. What are the treatment remission, response and extent of improvement rates after up to four trials of antidepressant therapies in real-world depressed patients? A reanalysis of the STAR*D study's patient-level data with fidelity to the original research protocol. BMJ Open. 2023 Jul 25;13(7):e063095. doi: 10.1136/bmjopen-2022-063095. PMID: 37491091; PMCID: PMC10373710.

Pigott HE. The STAR*D Trial: It Is Time to Reexamine the Clinical Beliefs That Guide the Treatment of Major Depression. Can J Psychiatry. 2015 Jan;60(1):9-13. doi: 10.1177/070674371506000104. PMID: 25886544; PMCID: PMC4314062.

Pigott HE, Leventhal AM, Alter GS, Boren JJ. Efficacy and effectiveness of antidepressants: current status of research. Psychother Psychosom. 2010;79(5):267-79. doi: 10.1159/000318293. Epub 2010 Jul 9. PMID: 20616621.

Rush AJ, Trivedi M, Fava M, Thase M, Wisniewski S. The STAR*D Data Remain Strong: Reply to Pigott et al. Am J Psychiatry. 2023 Dec 1;180(12):919-920. doi: 10.1176/appi.ajp.20230869. PMID: 38037409.

Rush AJ, Fava M, Wisniewski SR, Lavori PW, Trivedi MH, Sackeim HA, Thase ME, Nierenberg AA, Quitkin FM, Kashner TM, Kupfer DJ, Rosenbaum JF, Alpert J, Stewart JW, McGrath PJ, Biggs MM, Shores-Wilson K, Lebowitz BD, Ritz L, Niederehe G; STAR*D Investigators Group. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Control Clin Trials. 2004 Feb;25(1):119-42. doi: 10.1016/s0197-2456(03)00112-0. PMID: 15061154.