# Big O Of Sample Pooling: Test Optimization During Epidemics

- —a divide and conquer approach to Coronavirus test shortages
- 日期：2020年6月13日 16:02, 星期六
- 作者： Jesse Riggs

### A Shortage Of Tests

Since the onset of the Coronavirus pandemic, the medical community has dealt with a shortage of not only tests, but also professionals who can administer tests. Recently, the FDA has said that it will not object to a technique that could reduce the number of tests used. But, this technique is not anything new. It’s origins date back to 1943.

### Syphilis And WWII

Syphilis is a bacterial infection that, in it’s tertiary stage, can cause paralysis and dementia. The CDC reports that syphilis cases have dropped dramatically since 1941. But, during WWII, this disease posed a threat to US military operations. To lower the risk of soldiers becoming incapacitated, or problematic on the battlefield, the US Army started testing possible recruits. However, tests for antibodies were expensive and scarce. After serving as an operations analyst for the US Army Air Forces, Robert Dorfman offers an anecdote on how this problem was addressed. What he describes is called “group testing”.

### How Group Testing Works

The idea is to pool blood samples together and test multiple subjects at once. If a pooled test comes back negative, then we’ve only used one test. So, what happens when a group test comes back positive?

Let’s look at an example. Robert would like to test eight recruits. So, he divides each blood sample into four equal parts. The first quarter of each sample is pooled together for the first test. If it tests positive, then he picks samples from four of the recruits at random, and then tests the pool of their samples. If the test comes back negative, then he knows that the positive sample came from one of the other four samples. So, he pools two samples from the four that were not tested, and this test comes back negative. So, Dr Dorfman has narrowed the positive case to two samples, and he repeats the process one more time to find that one sample is positive.

This is great. It takes one test to verify that no samples test positive, and it takes only four to find a single positive sample among eight. But, this is only the outcome of two scenarios. There are many different cases where multiple samples could be positive. In computer science, we often look at how algorithms perform, based on how efficient they are at solving the worst case scenario. So, with Dorfman’s group testing, what’s the worst case scenario?

Let’s think about what happens when all eight samples test positive. It takes one test to verify that at least one sample in the eight are positive with syphilis. So, we make two pools, each containing four samples, to determine that at least two samples are positive. It takes twice as many tests to verify that there are at least two positives than there are at least one. Now, we test four pools of two samples each. This is again twice as many tests as the previous round. You may see a pattern here. Generally speaking, for any pool of n people, we’ll need to run a number of tests equal to ${2}^{0}+{2}^{1}+{2}^{2}+{2}^{3}+\cdots +{2}^{\mathrm{log}2\left(n\right)}$ ( where ${2}^{\mathrm{log}2\left(n\right)}=n$). In our example, we would run $1+2+4+8=15$ tests to find that all eight are positive. Since $15>8$ we have wasted seven tests.

You may notice that the formula for the worst case scenario has four terms. This is why we divided sample into fourths. In fact, the number of divisions we will make will be equal to $1+{\mathrm{log}}_{2}(n)$. Notice that the tree shown above is triangular. We could count the number of tests using the area formula for a triangle. The total number of tests, in the worst case scenario, is

$$\frac{1}{2}\mathit{base}\ast \mathit{height}=\frac{1}{2}n{\mathrm{log}}_{2}(n)$$

In computer science, we measure the upper bounds of required resources in Big O notation. We say that the function of group testing for n recruits is $O(n{\mathrm{log}}_{2}(n))$. After looking at this worst case scenario, we may wonder if there is a better way to group test.

### Origami Assay Multiplexing

The Origami Assay is a testing solution that uses a constant number of tests. Given a single positive sample, the Origami Assay can pick it out of a collection of 1122 samples with only 93 tests. The idea is to divide up the samples and then pool them in such a way that a collection of failed assays can be mapped to a common sample.

If any two samples map to the same two assays, then we have a collision. If one tests positive, then they will both test positive. In order to avoid this scenario, we limit the number of samples by some function of the number of assays. Notice that each sample is divided into two. We have four assays. How many ways can we choose two of the four assays?

In the mathematical field of combinatorics, we could ask, given n samples, how many ways can we choose k? Pronounced: n choose k.

$$\left(\begin{array}{c}n\\ k\end{array}\right)=\frac{n!}{k!(n-k)!}$$

In our case of four choose two, we have

$$\frac{4!}{2!(4-2)!}=6$$

If the number of samples exceeds this number, then at least two of the samples will have to share both their assays. By ensuring that two sampled don’t map to the same assay, we’ve solved the problem when we have a single positive sample. But, there are other situations, in which there are multiple positive samples, where collisions do occur.

Imagine the scenario in which samples 2 and 4 are positive. Since 2 maps to A, and 4 maps to B, both A and B fail. And since 1 maps to only A and B, then 1 erroneously tests as being positive. In cases where there are more than one positive sample, we could see false positives.

### Summary

Group testing is a way to reduce the number of tests required to identify people who have an illness. This method works well for needle in the haystack situations, where there are only a few positive cases in a large group of test subjects. Current implementations can use ten times fewer tests than samples, but have a possibility of false positives. This risk is reduced when illness rates are low.

For more information, check out the citations below.

### References

Big O Of Sample Pooling: Test Optimization During Epidemics by Jesse Riggs is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Based on a work at jesse-riggs.com.