STATISTICAL METHODS, I: REGRESSION ANALYSIS
The role of statistics in life sciences is ubiquitous, simply think of the millions of statistical tests for the efficiency of medications or medical treatments, or trials on (sometimes many thousands of) patients (see, e.g., Cleophas et al. [1]). Less common is the use of statistics to identify one or more of a large number of factors which might trigger pain or discomfort in a single patient („Precision Medicine“); an account was only given recently by Cleophas and colleagues [2]). And very rarely, a search is done to find positive or negative synergy effects (interactions) between these factors which go far beyond a mere addition of these factors. The reason for that is, of course, the huge number of possible combinations of two or more factors. For the sake of the patients, however, the number of tests should be as small as possible. We present a solution to this dilemma. The identification of these „suspicious“ factors can be very valuable in getting a diagnosis when this turns out to be difficult.
One of the best tools to discover the reasons for medical problems in a patient is the well-known method of (statistical) regression. Regression analysis is a very powerful tool. In medical statistics, regression analysis is usually used to analyze large samples, e.g., stroke risk as a function of age, hypertension, smoking habits etc. Concerning diagnosis, medical statistics considers terms of sensitivity, specificity, pre- and posttest probabilities etc. Here we show that statistical regression can be very useful in the diagnosis of an individual case by detecting unknown connections between a number of „suspicious“ factors. For that, the combination doctor and patient must identify these factors which might trigger a certain event. The patient has to undergo several tests in which (s)he is always exposed to some of the suspicious factors. The number n of factors determines roughly the number of required tests; a common rule of thumb says that n factors require about n² tests. So, if the number of possible factors is large, a very large number of tests are required. Hence a careful selection of possible factors is essential. The tests should follow a suitable test plan (experimental design) so that all factors are tested an approximately equal number of times. In statistics, this is usually called a “screening experiment”.
Of course, this method is much more conspicuous than a usual diagnosis, and so it will only be used in cases where conventional methods have failed. But doctor’s waiting rooms often contain patients who have run through an unsuccessful series of many tests generating numerous diagnoses. This can be very frustrating and sometimes also dangerous for them and usually takes much longer than the method we are demonstrating here. Especially the diagnostic path in patients with rare diseases may be troublesome. It is important to know that the number of rare diseases is much higher than people usually think. Sometimes, the patient can undertake the tests and measure the results by himself.
As an example, take a patient with unknown factors which trigger an allergy, where the usual diagnostic measures did not yield a satisfactory result. Suppose that the patient and the doctor suspect that n more factors x1, x2,…, xn might explain the allergy, e.g.,
- x1 = exhaust air of the vacuum cleaner (measured in minutes of exposure)
- x2 = intake of certain candies (measured in pieces), …
and so on. Then a test plan might ask the patient for an exposure to x1, x4, and x9, and to rank the degree y1 of allergy on – say – a scale of ten degrees of severeness. The second test might involve x2, x4, and x6, with a result of y2, and so on.
Then well-known statistical algorithms (see, e.g., Morris [3]) will yield a „formula“ of the type
y = β0 + β1x1 + β2x2 + … + βnxn (*)
where β0 is a constant (the „intercept“) and βi estimates the influence of xi to the overall allergy level. Usually, one also determines confidence intervals [βi-ci, β i+ci] so that they cover the βi with a confidence level of – say - 95%. If this interval covers 0, like for example in [-0.4, 0.7], one usually reacts in the way that the influence of the corresponding factor xi is to be doubted (not statistically significant) and so xi is eliminated from the list of interesting factors. This usually happens for many factors, such that eventually a small list of suspicious factors remains, and the doctors will pay their attention to these few factors. This reduction is often essential, because it makes a huge difference whether 100 or 3 factors have to be medically investigated. The patient might already be dead when the doctors come to explore factor # 50…
STATISTICAL METHODS; II: EXPERIMENTAL DESIGNS
But much more care must then be taken to the design of the experiments. It might be that xi and xj are never (or only once) tested together, and so no clarification of a synergy is possible.
The fairest way would be to test every xi the same number of times, and also to test every pair xi and xj the same number of times. A new problem now comes from the quick rise of binomial coefficients. With 5 factors, we have 10 possible pairs, but with 20 factors, we already have 190 pairs. So a clever trick is needed: we utilize some particular experimental designs:
A Balanced Incomplete Block Design (BIB-design), see, e.g., Lidl & Pilz [4], consists of a set P = {p1, p2, … , pv} of v „points“ and a collection B of b subsets B1, B2, … , Bb of P (called „blocks“), such that
- Each point in P belongs to the same number r of blocks
- Each Bi has the same number k of elements
- Each pair pi, pj of points belongs to the same number λ of blocks.
The pair (P, B) is then called a (v,b,r,k,λ)-design. The design is complete if B is just the collection of all k-element subsets, otherwise incomplete.
For an experiment like the one above (concerning allergies), a BIB-design can be turned into an experimental design as follows.
- The points are the factors (e.g., possible triggers for an allergy);
- Every block lists the factors which will be tested simultaneously in a test.
So a (v,b,r,k,λ)-design will test v suspected triggering factors; each test requires k suspected factor (at the same time), and one will need b tests. Number (i) above assures that every possible triggering factor will be tested the same number (namely r) of times, and every pair of possible factors will be tested together in exactly λ tests. So a BIB-design gives an experiment which is “fair” both to the factors and the tests.
It is not trivial at all to get such a design. Constructions usually come from areas „far away“, like from finite geometries or abstract algebra (structures like groups or near-rings).
Example: From mathematical considerations (see Ke-Pilz [5]) we might get the following design (which comes “out of the blue” now, but we will not give the long mathematical derivations):
P = {1, 2, 3, 4, 5, 6, 7} and B consists of the 14 collections
B1={2,4,5}, B2={1,3,7}, B3={1,2,6}, B4={1,5,7}, B5={1,3,4}, B6={2,3,7}, B7={4,5,7},
B8={1,2,4}, B9={2,6,7}, B10={2,3,5}, B11={3,4,6}, B12={3,5,6}, B13={1,5,6}, B14={4,6,7}
This gives a (7,14,6,3,2)-design. Suppose we have 7 factors x1, x2, … , x7. For the first test, we try the factors x2,x4, and x5, since B1 = {2,4,5}, and so on:
Fact.\Test
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
x1
|
●
|
●
|
●
|
●
|
●
|
|
|
|
|
|
|
|
●
|
|
|
x2
|
●
|
|
●
|
|
|
●
|
|
●
|
●
|
●
|
|
|
|
|
|
x3
|
|
●
|
|
|
●
|
●
|
|
|
|
●
|
●
|
●
|
|
|
|
x4
|
●
|
|
|
|
●
|
|
●
|
●
|
|
|
●
|
|
|
●
|
|
x5
|
|
|
|
●
|
|
|
●
|
●
|
|
●
|
|
●
|
●
|
|
|
x6
|
|
|
●
|
|
|
|
|
|
●
|
|
●
|
●
|
●
|
●
|
|
x7
|
|
●
|
|
●
|
|
●
|
●
|
|
●
|
|
|
|
|
●
|
|
Results
|
49
|
-2
|
-28
|
3
|
54
|
-1
|
51
|
98
|
-31
|
69
|
18
|
-35
|
-25
|
22
|
3
|
Table 1: An experimental design for testing 7 factors, each of them 6 times.
So we have v=14 tests, plus a „zero test“ for „technical reasons“ (the information matrix would otherwise not have full rank). One sees:
- Every test (except #15) involves b=3 factors (3 dots in every column)
- Each factor is tested in r=6 tests (6 dots in each row)
- Each pair of factors is tested together in λ=2
Observe that we have tested the 21 possible pairs xi and xj twice in only 15 rather than 2*21=42 tests! This „magic reduction“ to only ⅓ of the tests might be attributed to the fact that in each test three synergies are considered (in terms of experimental designs: we allow certain interactions to be aliased).
In the last row, we have supplemented some (fictitious) results of the tests. Linear regression gives the best estimates according to (*) in the section “Statistical Methods, I” as
y = 3 + 51x4 + 19x5 -41x6 …. Model 1
If one also uses interaction terms („synergies“), one gets instead
y = 2 + 47x4 – 31x6 +58x2x5 …. Model 2
Test
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
real
|
49
|
-2
|
-28
|
3
|
54
|
-1
|
51
|
98
|
-31
|
69
|
18
|
-35
|
-25
|
22
|
3
|
Mod.1
|
54
|
3
|
-38
|
19
|
54
|
3
|
73
|
73
|
-38
|
22
|
13
|
-19
|
-19
|
13
|
3
|
Mod.2
|
49
|
-2
|
-29
|
2
|
49
|
2
|
49
|
107
|
-29
|
60
|
18
|
-29
|
-29
|
18
|
2
|
Table 2: Comparing two test results with reality.
One easily sees that Model 2 describes „the reality“ considerably better than Model 1.
Let us remark that estimating possible product terms creates a problem because of the very small number of tests. We first look at the „main effects“ xi, remove the irrelevant ones, and always add one if the xixj to check which of them seemed to be statistically relevant. These are then added to the relevant main effects (thereby following the so-called hereditary principle). The final result might depend on these choices and their ordering, but doing the calculations several times with different choices and orderings may yield a robust choice. And for patients it does make a big difference if they have to undergo many more tests. Of course, repetitions of these 15 tests would also give much better results.
STATISTIC WORKS…
This describes the search for the reason for periodic paralysis in a patient which started in a patient at the age of 15 years and was ongoing for more than 50 years. Despite the consultation of more than 120 doctors and hospitals over decades, no reason was found. Since the patient reported an association of the paralysis attacks with a low intake of carbohydrates, the doctors first thought that a low blood sugar level might be the reason. This did not turn out to be true in this form, so doctors specialized for rare diseases considered the possibility of a periodic paralysis due to a defect in ion channels. But the ion-levels within the blood were normal during the attacks, which is typical in patients suffering on normokalemic periodic paralysis. Thus it was tricky to identify the intracellular mechanism, namely too high or too low sodium or potassium, respectively. The doctors mentioned that it might take a long time to check all these channels.
The patient (a mathematician) tried to support the research team by speeding up the search process. He noted each of his meals and how much calcium, sodium, potassium, protein, etc. he had eaten. He checked his body strength by pressing a bathroom scale, right after the meals, and again one hour later. The differences came up to about ±10 kilograms. So he first used statistics (model 1) above to identify the few most probable intake components which explain the differences, using the software package Mathematica®. After a period of about 5 weeks of carefully documented eating, he had the result
y = y(p,s) = -0·5 – 0·0048p + 0·0085s
with the interpretation that 1 hour after the intake of p mg of potassium and s mg of sodium, the patient could press the scale (on average) y(p,s) kg harder. The signs of the coefficients for p and s were significant on the 99% level, which means that potassium hurts the patient, while sodium helps him. The exact values of these coefficients are not so important, except that the patient now knows in advance, that, for examples, a typical burger will strengthen him by approximately y(420,1000) = 6·0 kg, while 100 g of banana will weaken him by y(390,1) = -2·4 kg.
Then the patient ran another test, this time with a BIB-design similar to the example above to check, if possibly a combination of other substances might overthrow this result. But no relevant combination was found, so the above result was accepted. In this case, however, no test strictly according to the BIB-plan was possible, since no food contains only potassium, calcium, and sodium, and no other substances. We will return to this point later.
After this finding, the doctors knew that they had to search for a defect in a potassium channel and, more importantly, that lowering potassium should be beneficial in this special patient. The subsequent analysis of various ion channels revealed a so far unreported defect of the promotor of the KCNJ18 channel. Interestingly enough, this channel was considered so far as a less likely candidate for paralysis and was no target candidate in well-established screening panels (see Kuhn and colleague [6]). The study and results were reported recently elsewhere in Soufi and colleagues [7]. Note that without the doctor’s hint to consider ion channels, the patient would never have conducted these experiments. And without medical competence, the results of the experiments could not have been properly interpreted. So this case might be considered as a fine and successful interplay between medicine, statistics, and abstract algebra.
MORE EXAMPLES
- Food-dependent exercise-induced anaphylaxis: The contact with some allergens might be harmless, physical exercise can help a lot, while the combination can be disastrous. So one factor is neutral for the patient, the other one positive, but the combination is really negative! See Romano et al. [8].
- Phototoxic Dermatosis: Smaller amounts of UV-doses can usually be tolerated without any reactions; the same applies to substances which enhance photosensitivity, like Phenothiazine or Furosemid. But if they are combined, they can create skin reactions similar to heavy sunburn effects.
- Hyperkalemic Periodic Paralysis: In a mild form, this disease can usually be tolerated, but in combination with a pathogenic gene mutation, it can create severe paralysis.
- Stomach Problems: A patient complains about stomach pains after some meals. His doctor suspects that seafood might be a reason, but this can hardly explain the pains. And he can exclude a large number of food components which do not hurt the patient. But 15 “suspicious” factors remain. The following might be a typical progression of the statistical investigation. A simple regression test like in “Statistical Methods, I” excludes quickly 8 of them. For the remaining 7 components, this test does not give satisfactory results. So we might use the test in “Statistical Methods, II”. Suppose that the remaining 7 factors are sugar (=S), apples (=A), lactose (=L), walnut (=W), pepper (=P), crabs (=C), and mustard (=M). So, according to the experimental design in “Statistical Methods, II”, the first test would be a meal with S, A, and W. Then a statistician quickly finds out that C does hurt a bit (as single factor), but the combination A & P is the main reason for the pains, while A and P alone do not really hurt.