Thursday, February 4, 2021

Testing for Covid 19, and Baye's Theorem: An Entry-Level Primer for Those Interested in Medical Testing Accuracy

I put together a little spreadsheet when Covid was first entering our awareness, in early 2020. Link to spreadsheet model (it may open in a Google app; if  so, there will be an "open" button somewhere that allows it to be opened in Excel, which may be better). 

There was an oft-repeated claim that the tests, though slow to come out, were highly accurate--I think they were often said to be 99% accurate, which is indeed very high. However, having been exposed to Bayesian revisions earlier in my career, I thought to see what this really means for accuracy in the field. Thus the spreadsheet. I pulled it up again recently, and though the early Covid testing issues are behind us, I still thought it might be interesting to some people. So I updated it as a learning tool--people are more interested in testing accuracy than they ever were before.

There are three inputs, the first two for what often passes for the "accuracy" of the test--the sensitivity of the test (its probability of identifying the disease among those known to have it); and the specificity of the test (its probability of identifying the absence of disease among those known to not in fact have it). The third input is the percentage prevalence of the disease in the tested population.

What we're interested in is the likelihood of true and false positive test results, and of true and false negative results. A more complete assessment of accuracy.

Now it won't be a surprise to anyone that the sensitivity and specificity of the test--are very important to the test's accuracy. It is very intuitive that lower values for these will give less robust results. (For simplicity here, we'll treat sensitivity and specificity as having the same value, but that often is not the case in the real world of medical testing; the attached model allows the values to differ.)

But the prevalence of a disease in the population being tested presents an interesting dynamic affecting real world accuracy, true accuracy, as experienced in the field. Let's say that sensitivity and specificity are both the same, a highly accurate 99%. If we're testing randomly selected persons drawn from the general population, then the prevalence might be only 1% or 2% (or less) of the population at any given time. That was perhaps the situation in the early days of testing, and perhaps even now. But if the prevalence is only 1%, then 50% of the positive tests will be false positives, and if 2% then one-third will be false. This means that when prevalence is low, the tests give very little certainty value to the results even if the tests themselves have high accuracy values; they don't establish positivity with much confidence. But--more in a moment.

Let me mention that if sensitivity and specificity are lower, but still high at 95%, and the prevalence is 2%, then nearly three-quarters of the positive test results will be false (72.06%). One must be especially dubious of the information value of testing if accuracy is high but not especially high, when prevalence is low. And in many common testing contexts, prevalence is indeed very low.

It gets better if prevalence is higher. Many people that test today in order to travel, go to work, or for other general reasons, can be considered drawn randomly from the larger population, and thus have a low prevalence just as above. But there is also a large portion that are testing because they have some symptoms that cause them to be concerned--they are evaluated as possibly being infected by either on their own or by a medical provider (they have a cough, or a fever, or whatever might give cause for concern). Perhaps the prevalence in this concerned portion of the testing population is higher--let's just be brave and assert baldly that it is 25%, for an example. That reduces the false positives to 13.64% for a 95% accurate test, and to 2.94% if the accuracy values are 99%. Certainly a test with less than 3% false positives is very informative. So to generalize, whenever by self-selection or by medical screening for symptoms indicative of the need for a test, functional prevalence is probably increased, raising the value of a positive test result.

This discussion should cause the reader to give some thought to the meaningfulness of a positive test for any medical condition, especially for the asymptomatic or the unscreened. But the news is not all bad. What are these three parameters, and how does it affect the probability of a false positive?

The power of a negative test result is quite high; there are far fewer false negatives on similar assumptions. What this means is that a negative test is an imperfect but nonetheless reasonably strong indicator that one doesn't in fact have the disease. This is important to travelers and others wanting to demonstrate freedom from a condition.

Why do we use tests when the false positives are high? Good question: There is a well-publicized concern that over-testing is endemic in all of medicine, where testing is far more widespread than many medical practitioners would like to see. Tests are apparently routinely issued in marginal cases, including many tests with only modest accuracy, given to a patient with unclear indications and thus the results are especially high in false positives. Some think it is out of an unhealthy concern over liability issues by the test prescribing provider; there is "safety" for the practitioner in checking extra boxes, perhaps?

But--what we do know is this: when the downside of having a disease is high, and when there is a premium on early diagnosis and treatment, as for Covid 19, it probably is appropriate to treat all patients that test positive as if they had the disease, even if the test result might well be a false positive. It's a "better safe than sorry" decision. This is also the case for many other medical conditions that are tested for--but not for all.

An interesting side note here is that all of these false positives from covid testing of the general population may very well be why we have so many "asymptomatic" cases! These may be the people--in whole or in part--that test positive but that in fact are not sick: False positive.

Anyway, you can use the spreadsheet to try out different input assumptions, and, if desired, to learn the intuition why false positives are often so much higher than we might naively think. It works the problem several different ways, hoping that at least one of them matches your learning style! 

Statisticians have adapted Bayesian revision to a great many far more complicated situations, not so easily shared. But this will give the reader some good basic intuition, in the framework of what really is a pretty straightforward application. And one very relevant today.