Mathematical temporal prediction of CD4+ lymphocytes in HIV/AIDS patients in antiretroviral treatment

Background: CD4+ lymphocyte count, measured through flow cytometry, is necessary for the following up of HIV-infected patients in antiretroviral therapy, however, the access to this test is limited in low-income countries. The objective of this investigation is to develop a mathematical methodology that allows to temporarily predict CD4+ values greater than 500, between 200 and 500, lesser than 200 for each patient. Methods: values of the populations of CD4+ lymphocytes greater than 500, between 200 and 500 and lesser than 200 cells and leukocytes in 250 patients were taken in sequential dates and combinations of the ranges. Temporal series of 12 prototypical patients were analyzed in search of predictive patterns, and then, these patterns were applied in the remaining patients in a blind study, finding the probability of success of the methodology for each range and its combinations, as well as sensitivity and specificity values. Results: five patterns with predictive percentages greater than 99% were found for the distinct conditions of the methodology, with values of sensitivity and specificity of 99%. Conclusions: through a predictive theoretical simplification a temporal self-organization was established for the measurements of leukocytes and CD4+ lymphocytes, which could be useful to improve the surveillance and survival of patients. lymphocytes; erythrocyte sedimentation rate, mean platelet value; red blood count; percentage lymphocytes, leukocytes, neutrophils, lymphocytes, monocytes, platelet mean corpuscular red blood load + CD4


Introduction
It is estimated that there are more than 36.9 million people living with the Human Immunodeficiency Virus (HIV) and that the deaths due to HIV or the Acquired immunodeficiency Syndrome (AIDS) are greater than 35 million since the appearance of HIV. For 2017, it was calculated that the deaths due to HIV were 940.000 [1]. The incidence of the disease has augmented, especially in Africa, Middle East, Eastern Europe and Central Asia, so this epidemic remains as a public health global problem, which makes necessary to obtain advances in this field to reduce the mortality of the population, especially in infants [2].
The CD4 + count, obtained through flow cytometry, is useful to conduct the surveillance of the progress of the disease and is taken as a basis to initiate antiretroviral management. However, flow cytometry is not a worldwide available test, particularly in low-income countries, given the limitation of the high costs associated [3]. The anterior, has generated the development of methodologies to replace this measure [4.5] or to find the population of CD4 + lymphocytes [6][7][8].
Some of these models have implemented techniques of machine learning to predict changes in the population of CD4 + cells. These models sometimes include parameters such as viral load, hemoglobin, age, among others, with which predictions with accuracies greater than 80% [8] have been obtained and specificities of 96% [9], nevertheless, these methods do not allow to establish predictions in delimited ranges.
From set and probability theories, in the context of algebra of sets, a methodology was developed to predict the value of CD4 + /μL 3 lymphocytes for specific populations of total leukocytes and lymphocytes obtained through the complete blood count, achieving predictions with a probability of one [10,11]. Said methodology has been applied to 500 and 800 patients, showing its predictive capability between 90% and 100% for ranges of 5000 and 400 leukocytes, respectively.
The purpose of this investigation is the generation of a clinically applicable methodology for HIVinfected patients in antiretroviral management that allows to predict the population of CD4 + /μL 3 in ranges greater than 500, between 200 and 500, lesser than 200 and its combinations from the absolute leukocyte count.

Population
For the present study, flow cytometry and complete blood count registries from 250 patients were analyzed from a database of the entreprise "Servicios y Asesorías en Infectología" which was evaluated by an expert infectious diseases specialist. For a same patient, different registries were considered in different dates, in order to generate a sequence of registries for said patient.

Procedure
Given that this methodology is based on the method of theoretical physics [12] in which the observation of phenomena is simplified, the studied phenomenon was simplified to only two variables to find essential mathematical relationships, choosing the quantity of leukocytes and CD4 + cells. An inductive process to develop the predictions was conducted with the sequence of registries of 12 patients with a characteristic representative behavior from mathematical patterns in order to establish a predictive evaluation of CD4 + lymphocytes from leukocyte values and the sequences of these values of previous registries. For that, 5 groups of possible dynamics were established as follows:

1.
Dynamics in which all the registries of the sequences present values of CD4 + lymphocytes >500 cells/µL 3 .

2.
Dynamics in which all the registries of the sequences present values of CD4 + lymphocytes between [200,500] cells/µL 3 .

3.
Dynamics in which all the registries of the sequences present values of CD4 + lymphocytes lesser than 200 cells/µL 3 .

4.
Dynamics in which the registries of the sequences present values of CD4 + lymphocytes >500 cells/µL3 but also between [200,500] cells/µL 3 .

5.
Dynamics in which the registries of the sequences present values of CD4 + lymphocytes <200 cells/µL3 but also between [200,500] cells/µL 3 .
Afterwards, considering these groups, the mathematical predictive patterns found were applied to develop a software in C ++ in order to automatize the prediction process. It is worth nothing that the theory of probability was applied to calculate the possibility of predictive accuracy in the totality of patients, which means that this theory is the one that allows to obtain predictions about the phenomenon studied, such as in quantum physics [12].
On the other hand, a blind study, masking the values of the variables of the remaining cases, was performed in order to establish a statistical parameter to calculate sensitivity and specificity, but this does not allow to obtain predictions.

Ethical aspects
According to the scientific, technical and administrative regulations for investigation in health,

Statistical analysis
The values of CD4 + /μL 3 are unmasked to obtained false positive and negatives as well as true positive and negatives through a contingency table to calculate sensitivity and specificity.

Results
In total, 1022 registries corresponding to 250 patients were analyzed, for which 91 patients presented   When applying the conditions of the methodology (see predictive results), the predictive accuracy varied between 0.96 and 1 for all the patterns for the ranges evaluated in time (table 3).

2.
If case a is presented, the greater probability is that in the posterior measurements, when leukocyte populations are ≥ 3,7 cells/mm 3 , the associated CD4 + populations will be >500 cells/µL 3 .

3.
If case b is presented, the most likely event is that if in the following measurements leukocytes values are ≥ 4 cells/mm 3 , the associated CD4 + populations will be [200, 500] cells/µL 3 .

4.
If case c is presented, the most likely event is that when a value of leukocytes between 2 and 3 cells/mm 3 in the following measurement, the associated CD4 + populations will be <200 cells/µL 3 .

5.
If case d is presented, and the values of leukocytes are between [3.0, 3.9] cells/mm 3 and the following measurement is within that range, then the associated CD4 + populations will be >500 cells/µL 3 or will be [200, 500] cells/µL 3 .

6.
If case d is presented and the measurement that presents a value of CD4 + between [200, 500] cells/µL 3 also present a value of leukocytes ≥ 4 cells/mm 3 and the measurement with a value of >500 cells/µL 3 also presents a value of leukocytes ≥ 3.7 cells/mm 3 , then the most likely event is that CD4 + values are between [200,500] or >500 cells/µL 3 .

7.
If case e is presented, and the value of leukocytes in the measurement that presents

8.
If case e is presented, and the value of leukocytes in the measure that contains CD4 +

<200 cells/µL 3 is higher than 3 cells/mm 3 , and for the registry of CD4+ between [200,500] cells/µL 3 a measure of leukocytes lesser than 3 cells/mm 3 is presented, it
is more likely that if the value of leukocytes is higher than 3 cells/mm 3 , then the measurement of CD4+ will be between [200,500] cells/µL 3 .
Then, after two measurements the leukocytes would be looked and with these seven steps, the CD4+ counts are predicted probabilistically in time.
The sensitivity and specificity values obtained were 99%.

Discussion
This is the first investigation in which predictions of 5 types of dynamics of CD4 + lymphocyte counts are conducted from the absolute leukocyte count for the cases with counts >500, between 200 to 500 and <200 cells/µL 3 as well as for the counts that present fluctuations of values between 200 to 500 and >500 and <200 cells/µL 3 in 250 HIV-infected patients in the context of probability theory, achieving a mathematical physical simplification of the phenomenon with a predictive precision of 99% with values of sensitivity and specificity of 99%.
With this methodology, mathematical relationships are established from the leukocyte counts and the values of CD4 + lymphocytes along time in ranges of clinical interest. Since the values of probability were always above 0.96 and three of the types of dynamics have probabilities of 1, this suggest that the phenomenon itself presents a strong underlying deterministic order and that the method is highly accurate to predict the counts. Given these considerations, this methodology could be useful for clinicians to perform following ups of patients in time and to evaluate the effectivity of antiretroviral regimens, especially in low-income countries [13,14], since only a complete blood count is required to establish measurements, which could also improve the medical assistance provided as well as patients' survival.
More recently, through algorithms such as Random Forest [19] of data mining to classify values of CD4+ <100 cells/µL 3+ , results close to 100% have been obtained implementing rules that involucrate variables such as liver function, age, marital status, employment, education status, residence condition, functional status, WHO clinical stage, baseline and current antiretroviral regimen as well as time with treatment, baseline CD4/CD8 ratio, religion, weight, among others. On the other side, the developed methodology offers a simplification because it only requires a variable to conduct predictions and besides, it distinguishes between ranges of clinical interest.
The mathematical thinking seeks to establish patterns [20], this why when addressing a phenomenon as the one in this article, the efforts must be focused in establishing the right questions since they will allow to decide which is the relevant data from the one that is not, and in the case of medicine, this confers great utility. According to Frenkel: "my experience is that only about 10-15% of the information that the doctors collected was ever used when they made the diagnosis or treatment recommendations" [21] and "Yakov Isaevich used to say that doctors' thinking was well adapted to analyzing particular patients and making decisions on a case-by-case basis. But this also made it sometimes difficult for them to focus on the big picture and try to find general patterns and principles" which highlights the necessity of the mathematical thinking to find patters that provide orders in the phenomena studied.
In this sense, the acausal impact of physics and mathematics is evidenced in the search of objective patterns that adequately describe distinct phenomena of nature. Following the line of thinking [12], different predictive methodologies have been developed that have addressed the problem of obtaining values of CD4 + lymphocyte counts from other variables. As an example of this, set theory is found, which has allowed to organize triplets of total leukocytes, lymphocytes and CD4 + counts to predict the counts of the last cellular line, with a precision up to 100% in specific ranges of leukocytes [22].
The same basis has generated studies that predict in different scenarios of medicine when simplifying variables commonly analyzed and establish fundamental characteristic of each situation, allowing predictions of mortality in critical scenarios through set theory and dynamical systems [23], the binding of peptides to HLA class II [24] through entropy and combinatory, the length of epidemics of malaria [25] through probability and the behavior of heart dynamics [26] through a mathematical chaotic law [27].

Availability of data and material
Both CD4 + cells and leukocytes counts will be made available on reasonable request.
Authors' contributions JR and SP: Designed the study.
SP: interpreted the mathematical results and developed the initial manuscript draft.
CP: interpreted the clinical results and performed the statistical analyses.
All authors contributed reviewing the manuscript and approved the final version for publication.

Ethics approval and consent to participate
Considering that this investigation developed mathematical calculations over a previously systematized and anonymized database of the enterprise "Servicios y Asesorías en Infectología", that personal information of the patients was not present at the moment of analyzing the data and that no diagnostic or therapeutic interventions were done with patients, ethics approval and consents to patients were not necessary.

Consent for publication
Not applicable.