Algorithms results in test samples
Figure 1, shows that the XGBoost algorithm achieved the highest f1 score of 78% and 92% for males and females respectively, among the five algorithms that were used on all 8 (4 per sex ) test samples. This was followed closely by the RF algorithm with a score of 76% for males and 93% for females. EN algorithm performance was 72% and 90% for males and females respectively. SVM performance was 71% for males and 89% for females. KNN f1 score was 54% for males and 88% for females while LGBM performed dismally with an f1 score of 48% for males and 88% for females, Table 3.
Algorithms results in left-out samples
Similarly, the six algorithms were trained on all the four left-out samples. The f1 scores between males and females substantially varied all the algorithms, Figure 1. However, XGBoost algorithm got the highest f1 score of 46% and 85% for males and females respectively, among the six algorithms. This was followed closely by the KNN algorithm with a score of 45% for males and 85% for females. SVM algorithm performance was 86% and 35% for males and females respectively. LGBM secured the highest score of 87% for males and a low score of 33% for females. RF scored 86% for males and the lowest 30% for females while EN was the worst-performing algorithm with an f1 score of 76% for males and 33% for females, Table 3. These out-of-sample results are generally worse than the in-sample results previously presented due to over-fitting the data.
Table 3. F1 score for Algorithms on the test, left-out and train samples
samples
|
XGBoost
|
KNN
|
SVM
|
RF
|
EN
|
LGBM
|
males test
|
0.78
|
0.54
|
0.71
|
0.76
|
0.72
|
0.48
|
female test
|
0.92
|
0.88
|
0.89
|
0.93
|
0.90
|
0.88
|
males left-out
|
0.46
|
0.45
|
0.86
|
0.86
|
0.76
|
0.87
|
females left-out
|
0.85
|
0.85
|
0.35
|
0.30
|
0.33
|
0.33
|
males train
|
0.74
|
0.52
|
0.67
|
0.73
|
0.68
|
0.45
|
females train
|
0.91
|
0.87
|
0.89
|
0.92
|
0.88
|
0.88
|
Imputation results for the entire datasets
We used the XGBoost algorithm in the subsequent analysis where we included all countries in the data. Table 4 shows the results of the two different imputation methods on all variables. XGBoost imputation methods performed slightly higher than the MICE imputation technique on
all features. The f1 scores on the validation set are 79.3% versus 73.3% for males and 93.1% versus 91.2% for females. As a result of the given performance, we used the XGBoost imputation method in further analysis.
Table 4. Performance of imputation methods on variables with test samples
TN: True Negative, TP: True Positive, FN: False Negative, FP: False Positive, PPV: Positive Predictive Value, S: Sensitivity,
|
TP
|
FP
|
FN
|
TN
|
F1
(%)
|
S
(%)
|
PPV
(%)
|
Complete with MICE imputation (males)
|
30,176
|
636
|
788
|
1,952
|
73.3
|
71.2
|
75.4
|
Complete with MICE imputation (females)
|
31,172
|
220
|
576
|
4,116
|
91.2
|
87.7
|
94.9
|
Complete with XGBoost imputation (males)
|
7,593
|
110
|
163
|
522
|
79.3
|
76.2
|
82.6
|
Complete with XGBoost imputation
females
|
7,815
|
33
|
122
|
1,051
|
93.1
|
89.6
|
97.0
|
Complete with XGBoost imputation males (15 variables)
|
7,586
|
117
|
163
|
522
|
78.9
|
76.2
|
81.7
|
Complete with XGBoost imputation females (12 variables)
|
7,518
|
33
|
128
|
1,045
|
92.8
|
89.1
|
96.9
|
Variable selection and direction of associations
We conducted an SFFS procedure in determining the saturation limit selecting variables based on f1 scoring. As a result, 15 and 12 most influential features of males and females were selected respectively, Figure 2. To understand how a feature contributes to the output of the model, we plot SHAP values, Figure 3 and 4 for males and females respectively. These variables are displayed after ranking in descending order, bearing the highest values of Shapley at the bottom. Here, all the values on the left represent the observations that shift the predicted value in the negative direction while the points on the right contribute to shifting the prediction in a positive direction. The graph summarises the impact of explanatory features on the model output. Features that increase or decrease the risk of HIV infection are coded in red and blue respectively. Being older, never attending school, at the highest level of education, at highest grade a school level, in avoidance of pregnancy, in TB treatment, in use alcohol drink, an urban dweller, aware of HIV status, wealthy, nonmarital and circumcised is predictive of HIV positivity.
Situations
A. 95% of individuals living with HIV know their state
Table 5, shows confusion matrices on test samples. A sensitivity of 95% for males would need 4,154 individuals out of 8,388 (49.52%) tested to identify 651 HIV positives from 685 persons living with HIV. Correspondingly, therefore, five individuals would need testing to find one person who is HIV positive with a PPV of 15.67%. 3,301 individuals out of 9,021 (36.59%) of females would require testing to detect 1,115 HIV positives out of the 1,173 persons living with HIV. Similarly, the PPV is 33.77% and needs 19 individuals tested.
B. 95% or higher probability of being HIV positive
We identified 348 (4·14%) males out of the 8,388 and 975 (10.81%) females out of 9,021 as a high-risk population. We find that 350 males would have been identified HIV positives out of 685 people living with HIV while 969 females would have been correctly identified HIV positives from 1,173 individuals, Table 5.
Table 5. PLHIV know their status and 95% or more probability of being HIV positive
95% of those with HIV
|
TP
|
FP
|
FN
|
TN
|
PPV
%
|
Know their status (males)
|
4200
|
3503
|
34
|
651
|
15.67
|
Know their status (females)
|
5662
|
2186
|
58
|
1115
|
33.77
|
95% or > probability of being HIV positive (males)
|
7690
|
13
|
350
|
335
|
99.26
|
95% or more probability of being HIV positive (females)
|
7842
|
6
|
204
|
969
|
96.26
|