I originally promised a review of the Bush Francis Catatonia Rating Scale, but while reviewing it, I came across some questions that I think are even more interesting.  I will discuss Bush Francis, but I want to do it in a larger context of the challenges that Psychiatrists face with diagnosis in general.

The Bush Francis Catatonia Rating Scale is a highly reliable tool for diagnosing and estimating the severity of catatonia. What that means is that if 10 psychiatrists use the tool to diagnose a patient with catatonia, about one of them will disagree with the other 9.  So, you have 9 yeas and 1 nay.  To put that more technically, “inter-rater reliability was tested in 44 simultaneous ratings of 28 cases defined by the presence of ≥2 signs on the 14-item screen. Inter-rater reliability for total score on the rating scale was 0.93, and mean agreement of items was 88.2% (SD 9.9). Inter-rater reliability for total score on the screening instrument was 0.95, and mean agreement of items was 92.7% (SD 4.9). 1

If you remember back to the first episode, I defined catatonia using the DSM 5 criteria, which needed 3 or more of 12 items.  Using Bush Francis, 2 or more presence criteria are considered diagnostic. The Bush Francis diagnostic screening instrument has 14 items.  Some of those items are more “lumpy” than the DSM items, but even more are more “splitty” than the DSM.  Here is what I mean by that.

1. Immobility/stupor = DSM Stupor
2. Mutism = DSM Mutism
3. Staring (part of Stupor)
4. Posturing/catalepsy = DSM Posturing split from Catalepsy
5. Grimacing = DSM Grimacing
6. Echopraxia/echolalia = combined from separate DSM Echolalia and Echopraxia
7. Stereotypy = DSM Stereotypy
8. Mannerisms = DSM Mannerism
9. Verbigeration (stereotyped & meaningless repetition of words & phrases):  In the DSM this would likely be considered stereotypy or echolalia.
10. Rigidity 
11. Negativism = DSM Negativism
12. Waxy flexibility ~= DSM Waxy flexibility, only in the Bush Francis, Waxy Flexibility has an added component of initial resistance.  Their concept of waxy flexibility is more like a candlestick at room temperature snapping than of a warm candle bending.
13. Withdrawal ~ lumped into Stupor in DSM
14. Excitement ~ This is approximated by the DSM category of Agitation, and is specified to not be attributable to akathisia and is not goal directed.

The Bush Francis rating scale has two parts, the initial is a 14 item diagnostic screening instrument.  You can rate the severity of the first 14 items, but for diagnostic purposes you can consider 2 positive items, regardless of severity as diagnostic of catatonia.  Importantly, in the Bush Francis, these signs need to be present for 24 hours or longer.  The rating scale goes on to discuss items 15 thru 23, which describe more possible catatonic presentations that can be used to rate the severity of catatonia, but are not included in the diagnostic screen.

They are:
15. Impulsivity (which would probably fall under the DSM criteria of agitation)

16. Automatic obedience (which you test by saying, stick out your tongue, I want to stick a pin in it, and see if they do it, which I think is a ridiculous test because doctors tell patients to stick out their tongue all the time and many people will defer to the authority of a physician even if they are not catatonic… ok I’ll stop complaining).

17. Mitgehen. This is also called passive obedience and is tested by telling the patient not to let you raise their arm, but they do anyway when you apply gentle pressure that should be easy for them to overcome.  Obviously, they need to be aware enough to process your instructions.

18. Gegenhalten.  This is almost the opposite of Mitgehen, and is described as involuntary resistance to passive movement of a limb to a new position that increases with the speed of the movement.  I do not think they need to be fully conscious for you to do this, but that is unclear.

19. Ambitendency.  I am ambivalent about how to truly tell ambitendency from other signs of catatonia.  It is described as appearing stuck while initiating or completing a task.  The clearest mental picture I can conjure is halting, robotic type movements, during which a patient appears to have difficulty coordinating their body to complete a task.

20. Grasp reflex. This simply refers to the primitive reflex we were taught to test in newborns.

21. Perseveration.  I’ll refer to episode 10, where I complain about the lack of specific meaning of this term, but add that in this rating scale, it can refer to returning to the same topic or repeating the same movement.  How the latter is different from their definition of stereotype is unclear.

22. Combativeness.  For this to score in Bush Francis, aggression or belligerence needs to be present in an undirected or inexplicable way.

23. Autonomic abnormality: think of this as the SIRS criteria of catatonia.  If in the last 24 hours there are abnormalities of temperature, blood pressure, respiratory rate, heart rate, excessive sweating, flushing or other autonomic signs, then score this as present.

If you remember only one of these additional signs and their nifty names, it should be autonomic abnormality, because this is a clue that your patient may be developing malignant catatonia and you need to pay close attention and make sure they receive treatment as soon as possible.  If you remember only one of these additional signs, it should be autonomic abnormality… wait did I already say that.

If you want a video lesson on how to apply the BFCRS, then Google Bush Francis Rochester URMC. I will also add the link in the show transcript located at 

There are other scales used to diagnose and rate the severity of catatonia, including the Modified Rogers Scale (MRS), Rogers Catatonia Scale (RCS), Northoff Catatonia Rating Scale (NCRS), Braunig Catatonia Rating Scale (BCRS), and the Kanner Scale.  I am not going to discuss the other scales, but in the show transcript, I will add references.

Without doing any hard math or actual testing, you are probably able to note that since the BFCRS has more items than the DSM 5 and you can consider at least two signs as diagnostic in Bush Francis, as opposed to more than 2 signs in the DSM, it is going to be more sensitive and will more often diagnose catatonia resulting in more patients getting treated, for better or for worse.  Often sensitivity comes at this expense of specificity, and given the squirrelly nature of our understanding of Catatonia, I imagine that this is likely the case.

Also, I mentioned that the BF is a high reliability tool for diagnosis.  Reliability by itself is not an approximation of validity.  Just because something is reliable does not mean that it is meaningful.  While a test that is unreliable is also not valid, a tool that is not valid can be highly reliable.  Imagine a disgruntled employee at a ruler factory decides to mess with the settings of the ruler-making machine producing rulers that are 12 and ½ inches long instead of 12 inches long.  These rulers are all the same, and are highly reliable, but they are not valid.  They are not measuring the thing that they say they are measuring.  They are inaccurate.

Using the BFCRS as an example, for it to have construct validity, it would need to measure catatonia in a way that predictably produces a result that is reflective of the patient’s actual disease process.  A higher score on the test would mean a more severe disease, but not only more severe disease in general, but more severe catatonia.  Construct validity is a measure of whether you are measuring a real thing or not.  Is your construct a reflection of reality?

Another type of validity is content validity.  It specifically refers to whether a test is measuring every aspect of a disease.  High content validity means you are not missing anything.  The BFCRS has more items than the DSM 5 and you might think that measuring more things would give you higher content validity, but measuring more content might come at the expense of construct validity, especially if you start measuring things that are not directly caused by the thing you think you are measuring.

There are too many types of validity to mention them all here, but I want to discuss two more that I feel are highly relevant to tests like the BFCRS and to your PRITE.  These are internal and external validity and they are related to how reliable a test is in different environments, not because the criteria are applied unreliably, but because the construct you are measuring is not the same in different environments, either because it does not exist outside of the place it was first found, or it is influenced by factors that weren’t predicted in the controlled environment.  Because what you are measuring is influenced by different factors in different environments, it presents differently.  An internally valid test measures a construct well in a particular environment where it is needed to be used, such as an inpatient unit.  Now, imagine taking a walk in 1980s New York City and grading the people you see on the Bush Francis.  You’ll likely get far more positives than you have a real need for benzos.  I doubt the BFCRS has high external validity, but there are not many test of validity of psychiatric syndromes.  Having the criteria that a sign needs to be present for at least 24 hours is likely to make the test both more reliable and more valid because you are only counting things that are not transient.

We don’t have good measures of validity for most psychiatric constructs (also called diagnoses), so we use reliability as a proxy.  It should be well known among psychiatry residents that psychiatrists in the US for decades reliably diagnosed black men with schizophrenia at a much higher rate than is probably true.  This raises questions about the reliability of the schizophrenia construct and validity of the content measures we use from the DSM to diagnose schizophrenia.  One might also conceptualize this as a difference between internal and external validity of the criteria (if you define “internal” as applying a test in mostly white areas versus “external” as mostly black areas).  Cultural formulations are supposed to aid psychiatrists in being more aware of the validity and reliability of our diagnoses.

None of this is to make you not use the Bush Francis or any other scale.  Instead, just be mindful.  Be aware of how well (or in my case how poorly) the diagnostic criteria are applied.

