Achieving maximum speedup in multi-level acceleration for massive coronavirus testing

It is well and widely known that sample pooling could provide an effective and efficient way for fast coronavirus testing among massive asymptomatic individuals. The method of multi-level acceleration for asymptomatic COVID-19 screening has been introduced, and for one and two levels, the optimal group sizes have been obtained. However, there are still multiple challenges. First, it is not clear how to find the optimal group sizes for three or more levels. Second, there is lack of closed-form expressions for the optimal group sizes for two or more levels. Third, it is not clear how to determine the optimal number of levels. And last, it is not known what the maximum achievable speedup is. The motivation of this paper is to address all the above challenges. The optimization of a hierarchical pooling strategy includes its number of levels and the group size of each level. In this paper, based on multi-variable optimization and Taylor approximation, we are able to derive closed-form expressions for the optimal number of levels , the optimal group sizes , ,…, , and the maximum possible speedup of a hierarchical pooling strategy of , where is the fraction of infected people. The above speedup is nearly a linear function of the reciprocal of , in the sense that it is asymptotically greater than any sub-linear function of the reciprocal of for any small . Using the results in this paper, we can quickly and easily predict the performance of an optimal hierarchical pooling strategy. For instance, if the fraction of infected people is 0.0001, an 8-level hierarchical pooling strategy can achieve speedup of nearly 400.


Background
It is well and widely known that sample pooling could provide an effective and efficient way for fast coronavirus testing among massive asymptomatic individuals [1,2]. Sample pooling strategies can save substantial time and resources compared to individual testing during epidemic surveillance and large-scale COVID-19 screening [3,4]. It was reported that up to 89% fewer tests would be required for group size of 3-25 in a population of 150,000 with an infection prevalence of 1% [5]. It was also found that by pooling 384 samples into 48 groups, both an 8-fold increase in testing efficiency and an 8-fold reduction in test costs can be achieved [6]. The approach of sample pooling and group testing has been introduced [7,8], adopted and applied [9][10][11][12][13][14][15], extensively studied [5,6,[16][17][18][19][20][21][22], and reviewed [23,24]. The method of multi-level acceleration for asymptomatic COVID-19 screening has been introduced in [25]. For one and two levels, the optimal group sizes were obtained in [25]. However, there are still multiple challenges. First, it is not clear how to find the optimal group sizes for three or more levels. Second, there is lack of closed-form expressions for the optimal group sizes for two or more levels. Third, it is not clear how to determine the optimal number of levels. And last, it is not known what the maximum achievable speedup is. The motivation of this paper is to address all the above challenges.

Contributions
The optimization of a hierarchical pooling strategy includes its number of levels and the group size of each level. In this paper, based on multi-variable optimization and Taylor approximation, we are able to derive closed-form expressions for the optimal number of levels d * = ln(1/ ln(1/q 0 )) − 1, the optimal group sizes m * 1 = e d * = 1/(ep 0 ), m * 2 = e d * −1 = 1/(e 2 p 0 ),..., m * d * = e = 1/(e d * p 0 ), and the maximum possible speedup of a hierarchical pooling strategy of 1/(ep 0 ln(1/p 0 )), where p 0 is the fraction of infected people. The above speedup is nearly a linear function of the reciprocal of p 0 , in the sense that it is asymptotically greater than any sub-linear function (1/p 0 ) 1− of the reciprocal of p 0 for any small > 0. The paper is organized as follows. In Section 2, we describe the hierarchical pooling strategy and analyze its performance. In Section 3, we derive closed-form expressions for the optimal group sizes for one and two levels. We confirm their accuracy by comparing them with know solutions. In Section 4, we derive closed-form expressions for the optimal group sizes and the optimal number of levels. We also demonstrate numerical data. We conclude the paper in Section 5.

Hierarchical pooling strategy
In this section, we describe the hierarchical pooling strategy and analyze its performance.

Description of the strategy
A hierarchical pooling strategy involves pooling samples from multiple people and works as follows. A d-level hierarchical pooling strategy (HPS d ) has d ≥ 1 levels. The size of a level-j group is m j , where 1 ≤ j ≤ d. For convenience, a population of size N can be treated as a level-0 group of size m 0 = N.
(1) Perform a group test for S; ( 2 ) if (the group test result of S is negative) (3) return P; ( 4 ) end if; n ← m j /m j+1 ; ( 7 ) Divide S into S 1 , S 2 , ..., S n ; ( 8 ) for k ← 1 to n do (9) P k ← HPS d (j + 1, S k ); (10) P ← P ∪ P k ; (11) end for; (12) Algorithm 1 gives a recursive description of the HPS d procedure. On level j, a group test is performed for a level-j group (which is divided from a level-(j−1) group) of size m j (line 2). If the test result of a levelj group of m j samples is negative, we know that all the individual samples in the group are negative (lines [3][4][5]. If the test result of a level-j group of m j samples is positive, where 1 ≤ j ≤ d − 1, then the m j samples proceed to level j + 1, i.e. they are divided into level-(j + 1) groups of size m j+1 , which are processed by using the same HPS d procedure (lines 7-12). One level d, the individual samples of a level-d group are tested one by one without sample pooling (lines 14-19).

Analysis of the strategy
Let us define the following variables.
• p 0 : the probability that the test result of one individual is positive. • q 0 : the probability that the test result of one individual is negative. • p j : the probability that the test result of one level-j group is positive under the condition that the test result of a level-(j−1) group is positive, where 1 ≤ j ≤ d. • q j : the probability that the test result of one level-j group is negative under the condition that the test result of a level-(j−1) group is positive, where 1 ≤ j ≤ d. • T j : the expected number of tests for one level-j group, where 1 ≤ j ≤ d.
• T j : the expected number of tests for one level-j group under the condition that the test result of the level-j group is positive, where 1 ≤ j ≤ d.
The following theorem gives p j and q j for all 1 ≤ j ≤ d.

Theorem 2.1: For a d-level hierarchical pooling strategy, we have q
and p j = 1 − q j , for all 2 ≤ j ≤ d.

Proof:
The equations for q 1 and p 1 are straightforward. As for q j , where 2 ≤ j ≤ d, we have where p 1 p 2 · · · p j−1 is the probability that the test result of a level-(j−1) group is positive (i.e. the condition), which implies that the test results of all corresponding level-1,..., level-(j−2) groups are positive; q m j 0 is the probability that all the m j samples in a level-j group are negative (i.e. the test result of one level-j group is negative); and (1 − q ) is the probability that at least one of the remaining (m j−1 − m j ) samples in the same level-(j−1) group is positive (to keep the condition). The equations for p j , where 2 ≤ j ≤ d, are straightforward.
The following theorem gives closed-from expressions of p j and q j for all 2 ≤ j ≤ d.

Theorem 2.2:
For a d-level hierarchical pooling strategy, we have Proof: We can prove by induction on j ≥ 2. First, it is easy to verify that the claim is correct for p 2 and q 2 . Next, we assume that the claim holds for p 2 and q 2 ,..., p j−1 and q j−1 . For q j , we notice that , by the induction hypothesis, which yields q j and p j .
Let T pooling (m 1 , m 2 , . . . , m d ) be the expected number of tests of a d-level hierarchical pooling strategy. The following theorem gives T pooling (m 1 , m 2 , . . . , m d ), and T j and T j for all 1 ≤ j ≤ d. if the test result of the group is positive (which happens with probability p j ), T j + 1 tests are required, one for group test, and T j for proceeding to level j + 1. Hence, the expected number of tests for one level-j group is

Theorem 2.3: For a d-level hierarchical pooling strategy, we have
The following theorem gives a closed-from expression of T j for all 1 ≤ j ≤ d.
Next, we assume that the claim holds for T j+1 . For T j , we have This proves the theorem.
Note that the number of tests without sample pooling is N. Therefore, the speedup of a d-level hierarchical pooling strategy is The biggest challenge is to find m 1 , m 2 , . . . , m d , such that S(m 1 , m 2 , . . . , m d ) is maximized. In fact, the number d of levels should also be optimized.

Closed-form expressions
In this section, we derive closed-form expressions for the optimal group sizes when d = 1 and d = 2.
The key method to derive closed-form expressions is to use the following approximation. For the function f (x) = ln x, we use the Taylor approximation The above equation is repeatedly used in this paper.

One-level acceleration
The following theorem gives closed-form expressions of the optimal group size and the maximum speedup when d = 1.  Proof: For a one-level pooling strategy with group size m 1 , we have To find the optimal value of m 1 , we need to minimize Note that which gives the optimal group size m * 1 as Furthermore, we have the optimal speedup This proves the theorem. Table 1 shows the accuracy of the above closed-form expression of m * 1 (actually m * 1 ) compared with the real optimal value of m 1 obtained in [25]. It is easily seen that our closed-form expression is very accurate.

Two-Level acceleration
The following theorem gives closed-form expressions of the optimal group sizes and the maximum speedup when d = 2.
To minimize T 1 , we need to minimize To find the optimal value of m 1 , we notice that and The speedup can be treated as a function of m 1 : We need to minimize This proves the theorem. Table 2 shows the accuracy of the above closed-form expressions of m * 1 and m * 2 (actually m * 1 and m * 2 ) compared with the real optimal values of m 1 and m 2 obtained in [25]. It is easily seen that our closed-form expressions are very accurate, especially when p 0 is small.

Multi-level acceleration
In this section, we derive closed-form expressions for the optimal group sizes and the optimal number of levels for a hierarchical pooling strategy.
The main result of this section is the following theorem, which gives closed-form expressions of the optimal number of levels, the optimal group sizes, and the maximum speedup for all d ≥ 1. The optimal group sizes are , which is actually , or equivalently, The rest of the section is devoted to proving the above theorem.

Optimal group sizes
Now, let us consider a d-level hierarchical pooling strategy with group sizes m 1 , m 2 , . . . , m d . By Theorem 2.4, we know that which is actually and approximately, The above approximation makes it possible to derive the optimal group sizes in closed-form.
Solving the above equations, we get , and the speedup is

Optimal number of levels
To find the optimal number of levels, we view the speedup as a function of d: which gives the optimal number of levels d * as d * = ln 1 ln(1/q 0 ) − 1.

Numerical data
We now demonstrate some numerical data.
In Figure 1, for p 0 = 0.001], we show the speedup S(d) as a function of number of levels d. It can be observed that as d increases, S(d) also increases. However, to certain point, S(d) decreases as d further increases. It is clear that there is an optimal value of d * = 6, such that S(d) is maximized.
In Figure 2, for q 0 = 0.900, 0.905, 0.910, . . . , 0.995, we show the maximum achievable speedup S(q 0 ) of a hierarchical pooling strategy as a function of q 0 . It is observed that as q 0 increases, S(q 0 ) increases very rapidly.  In Figure 3, we show the maximum achievable speedup S(1/p 0 ) of a hierarchical pooling strategy as a function of the reciprocal of the fraction of infected people 1/p 0 : S(1/p 0 ) = 1/p 0 e ln(1/p 0 ) .
It can be seen that S(1/p 0 ) is nearly a linear function of 1/p 0 . Actually, although S(1/p 0 ) is not really a linear function of 1/p 0 , it grows faster than any sub-linear function (1/p 0 ) 1− for any small > 0.

Concluding remarks
We have successfully derived closed-form expressions for the optimal number of levels and the optimal group sizes of a hierarchical pooling strategy. These expressions enable us to achieve the maximum possible speedup (whose closed-form expression is also available) of a hierarchical pooling strategy. Using the results in this paper, we can quickly and easily predict the performance of an optimal hierarchical pooling strategy. For instance, if the fraction of infected people is 0.0001, an 8-level hierarchical pooling strategy can achieve speedup of nearly 400.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The