Identifying the key nodes of HIV molecular transmission network among men who have sex with men in Guangzhou, China: A cross-sectional study


 Background

Identifying the most influential spreaders in HIV transmission networks is crucial to develop effective prevention strategies. The purpose of this study is to identify the key nodes of the molecular transmission network among MSM, which has significant insight into developing effective HIV prevention strategies.
Methods

We analyzed HIV-1 pol sequences provided through the Center for Disease Control and Prevention, Guangzhou, China. Sequences were obtained from newly-HIV diagnosed MSM during 2015–2017. We calculated pairwise genetic distance, identified linked pairs of sequences (those with distance ≤ 1.5%), and examined key nodes of these potential transmission partners.
Results

Of 184 HIV recently diagnosed men who have sex with men (MSM), 40.76% were linked to other MSM. Social network analysis demonstrated that there were 9 key nodes were detected. From the results of multivariate logistic regression model, young MSM born in the 1990s and 1980s was 0.06 and 0.12 times as likely to be a key node as was the older MSM born in the 1970s and before.
Conclusions

 There were a lot of subgroups connected by sharing comembers in HIV molecular transmission network among MSM in Guangzhou. Some HIV-infected MSM called as key nodes mediated the transmission of HIV among different subpopulations. Young MSM were less likely to promote HIV transmission than older MSM.


Results
Of 184 HIV recently diagnosed men who have sex with men (MSM), 40.76% were linked to other MSM. Social network analysis demonstrated that there were 9 key nodes were detected. From the results of multivariate logistic regression model, young MSM born in the 1990s and 1980s was 0.06 and 0.12 times as likely to be a key node as was the older MSM born in the 1970s and before.

Conclusions
There were a lot of subgroups connected by sharing comembers in HIV molecular transmission network among MSM in Guangzhou. Some HIV-infected MSM called as key nodes mediated the transmission of HIV among different subpopulations. Young MSM were less likely to promote HIV transmission than older MSM. Background HIV/AIDS has been a major public health concern for a long time, the model of HIV transmission has been changed in recent years in China. Based on the report of AIDS epidemic in China in December 2017 [1], sexual transmission accounts for more than 90% infections. And 26.86% of the sexual transmission infections were the men who have sex with men (MSM). A report from UNAIDS pointed out that MSM has 24 times higher risk of HIV infection than heterosexual men [2].
Several social factors contribute to the high incidence of HIV among MSM according to prior studies [3,4]. According to the research conducted by Ethan Morgan and his collegues, it is necessary to conduct some investigations which focus on networks of targeted population rather than traditional epidemiologic factors such as geographic areas of high incidence [5]. Social networks are conceived as stable patterns of interactions among people. Social network analysis allows for the assessment of the perceived relationships of individuals [6]. The Transmission Reduction Intervention Project (TRIP) used a networkbased contact tracing intervention proved it was more effective and ecomomical [7] in detecting undiagnosed HIV-positive people [8] as well as to nd and treat people early in their HIV infection [9].
From an epidemiological perspective, an intervention changing or removing the small number of individuals from target population is the most cost-effective means of constraining disease transmission. Therefore, it is absolutely essential to identify the small key population who has been playing important role in causing HIV transmission.
Samuel R Friedman and their collegues mentioned in a review article that a person's position whithin the network can have an important impact on HIV transmission and the social network research has potential in interventions [10]. Yuri A Amirkhanian et al. conducted several two-arm randomized trials found that interventions that engage the identi ed in uence leaders of target MSM social networks to communicate theory-based counseling and advice can reduce signi cantly sexual risk behavior [11,12]. The process of identi cation of network leaders was classic siometric method which belongs to a sort of network structure analysis.
The analysis of structure of networks provides the optimal way to con rm the location and its role of the members which can be helpful to identify the key nodes. Ste´phane Helleringer and Hans-Peter Kohler revealed important differences in the structural position of HIV-positive individuals when analized sexual network structure and the spread of HIV in Africa in a cross-sectional sociocentric survey [13]. Given the hidden nature of the MSM population, it is di cult to con rm the relationship ties between any two members of the communities which is the rst step to analyze network structure during traditional elding epidemiologic investigations. Phylogenetics provides the probility for network structure analysis in HIV research. Inferring putative transmission is the process of utlizing molecular phylogenetics analyzed by using HIV sequence to identify the transmission events in groups of individuals [14]. The large HIV clinical trials of HIV antiretroviral therapy (Hp052) was an example of this application [15]. Thus, we can use the linkage between sequences to reconstruction of HIV transmission network on the level of molecular. For example, Ethan Morgan et al. utilized the data of molecular genetic network combining with social, sexual and Facebook network data from the same cohort to examine potential overlap between the networks [16].
However, most studies on HIV transmission networks concerned about HIV molecular clusters, as well as the associated factors [17][18][19] rather than network structure analysis such as cohesive subgroup method which are more helpful for us to identify one or more speci c key nodes in the networks. The purpose of this analysis is to identify key nodes of the molecular transmission network among MSM, which has signi cant potential in developing effective HIV prevention strategies.

Sources of Sequence Data and Inclusion Criteria
In this study, the boundary of network which is important for whole-network analysis was newly diagnosed as HIV-positive men who were transmitted through having had sex with men in the HIV

Identi cation of HIV molecular transmission networks
HIV molecular transmission network was identi ed on the base of genetic distance [21]. Putative transmission ties of the network were identi ed by dichotomized data, which is determined by whether the pairwise genetic distance is less than 0.015 substitutions per site within all sequences [22,23] . In our study, the Tamura-Nei 93 pairwise genetic distances were calculated by Mega V.7.0 [24].

Cohesive subgroups analysis
All social network analysis was conducted by UCINET6.0.
Cohesive subgroups analysis is a powerful and mathematically rigorous method to characterize network robustness. The strength lies in the capability to detect strong connections among nodes that not only have no neighbors in common, but that may be distantly separated in the graph [25].

2-cliques
A clique is a subgroup of actors in which each actor is adjacent to any other actors in it, and it is impossible to add any other actors to the clique without violation of this condition [26]. In our study, we constrain the minimum size of any clique is three.
clique co-membership When there are a large number of cliques, it is di cult to interpret the result of cohesive subgroups for the overlap between cliques can hide features of the structure. An optimal way would be to try to remove or reduce the overlap by performing additional analysis such as clique co-membership [26]. The rst step is to combine the cliques who share the same actors more than 2/3 of the members. After the rst step, we can merge again the cliques who share the same actors more than 1/3 of the members if there remain a lot of cliques [27]. From the small number of cliques, we can detect a set of key nodes acting as bridge between Subgroups.
Lambda sets lambda sets, based on the property that members of the set have greater edge connectivity with other members than with non-members, is shown to correspond to a particular hierarchical clustering of the nodes in a network. It is a maximal subset of actors who have more edge-independent paths connecting them to each other than to outsiders. Since Actors in lambda sets with connectivity λ have a minimum of λ independent paths linking any one to any other. When λ is large, a lambda set describes a subset that is relatively di cult to disconnect by means of edge removals [28]. In infectious disease research, we can detect who are the most active in the subgroup which is most important for disease control.

Statistical analysis
All statistical analyses were conducted by SAS V.9.4. Multivariate logistic regression model was used to analyze the demographic characteristics of these key nodes.

Network visualization
The network data were visualized and analyzed using UCINET 6.0.

Results
Of the 184 HIV-1 sequences which were recently HIV diagnosed in 2015-2017, 75 sequences at least had one relationship tie with another ( Fig. 1: Network diagram of 75 nodes who at least had one relationship tie with another among 184 sequences). The characteristics of the participants are presented in Table 1. All percentages are line percentages.
Cohesive Subgroup Analysis

2-cliques
Social network analysis demonstrated that there were 14 cliques which at least include 3 nodes. The biggest clique includes 24 members, and there were some cliques sharing the same member. The clique 1 through 8 shared a lot of same members, and clique 9 which only includes 4 members did not share any member with others ( Table 2). . All of the above 5 nodes were nested hierarchically in the set with λ 1 which has the largest number of members. These ve nodes have the most relationship ties in the set and were in the most active central position. Furthermore, there were 3 lambda sets with λ value of 3, at least three independent paths were connected between any two nodes in the set.
Interestingly, nomatter we analized the 2-cliques or the Lambd set, we always found that there was a small group composed of four members independent of any other group, the set composed of 25, M101, M103 and M104 shows relatively independent and stable characteristics. Of the four members, aged under 30, were born in the late 1980s and 1990s., and each member maintains a communication relationship with the other three members. were con rmed as the most active nodes in one subgroups. We analyzed the demographic characteristics of these key nodes. From the results of multivariate logistic regression model, young MSM born in the 1990s and 1980s was 0.06 and 0.12 times as likely to be a key node as was the older MSM born in the 1970s and before , given that they were the same for educational level, marital status and diagnose year ( Table 4).

Discussion
To response the high HIV prevalence among MSM in China, it is necessary to analysis the characteristics of the transmission network.
One study conducted in San Diego, California identi ed that 54% were connected by at least one putative transmission link to others in the network of HIV infected MSM [22]. Our study showed that the actor had at least one transmission relationship tie with another were 40.76% in the network of HIV newly diagnosed in 2015-2017 among MSM in Guangzhou. The transmission network clustering degree was lower than the result of previous research in San Diego, the limit sample size and different areas could be part of the reason, however, to a certain extent, it also showed that there was a widespread transmission relationship among HIV-infected MSM in Guangzhou.
One of the key features of a network which are more important for controlling infection disease is the location and its role of the members which is detected by network structure analysis. The network structure analysis has potential ability to detect a set of speci c key nodes which has a signi cant in uence on disease transmission dynamics by exploring centrality or cohesive subgroup.
Based on accessibility cohesive subgroup analysis called as 2-clique method we detected 14 cliques. Furthermore, By using clique co-membership method there were four key nodes{30, M026, R12, M056}acting as brokers between Subgroups. The four nodes which occupied important bridge locations were critical in controlling and understanding the spread processes as well as to develop effective prevention strategies. A HIV Prevention Network Approach research has proved it: to select candidates who connect across groups of otherwise disconnected individuals(such individuals are known as "bridging actors") based on their network positions were more likely to enhance the diffusion of innovative HIV prevention interventions when compared to other centrally-located popular opinion leader [29]. The difference was that we used different approach to identify bridging actors. John A. Schneider and his colleagues identi ed bridging actors by calculating the bridging scores using betweenness centrality. Betweenness centrality is a measure of how often a given node falls along the shortest path between two other nodes. The score is a single value for each node in the network. in our study, the top three nodes with the highest score were 27, M057 and 4. M026, R12, 30 and M056 were only in 4th, 7th, 8th and 9th place. In this study, we thought the node has power because it can threaten to stop transmitting. But this threat only works if the other nodes cannot easily create new ties to simply go around the recalcitrant node. From gure 1, we can see that 27, M057 and 4 did not have this threat capability. Moreover, Betweenness centrality method did not successfully identify actor M026 and 30 who played an important potential role in mediating HIV transmission between different subgroups. Therefore, we used cohesive subgroup analysis method which apply subgroup concepts to identify bridging actors. From our results, we can infer that there were a lot of subgroups connected by sharing comembers in HIV transmission network among MSM in Guangzhou. It is essential to recognize the bridging population who mediated the transmission of HIV between different subpopulations among MSM.
Based on connectivity cohesive subgroup analysis called lambda sets method we detected 17 lambda sets. The fact that lambda sets generate a series of groups that are nested hierarchically within each other means that the data analyst is able to choose the level of detail to analyze [28]. There were 5 nodes {26, M050, 4, 27, M057} have a minimum of 10 independent paths linking for any two of them. That means all 5 of them are di cult to disconnected by means of edge removals. They were in eight different sets together, indicating that these nodes were important within the group and were possibly taking on some kind of leadership role, we can use degree centrality to measure this. Actually, the 5 nodes {26, M050, 4, 27, M057} were the top ve nodes with the highest degree centrality in the HIV transmission network with 184 nodes in our research, but we can get a clearer understanding of the position and role of these ve nodes through lambda sets analysis. In fact, they were active only in a larger subgroup of the transmission network in this study, rather than participating in the whole network of HIV transmission. So that the effect of intervention may be limited to small groups if the intervention programm identi es the peer leaders by calculating degree centrality in the target network. In our study, there were at least three independent subgroups with members closely connected with each other within them. Therefore, it is immensely vital for HIV prevention and control to determine the subgroups with different characteristics in HIV transmission network among MSM.
Centrality which is de ned as the number of connections with the other nodes in the network were widely proved to be associated with the dynamics of transmission of infectious diseases [30,31]. However, centrality measures are simple but maybe less effective for they neglect the whole structure of the network [32,33]. From our analysis above, we can see that it is easy to ignore some important nodes and subgroups by calculating centrality to identify key nodes. Therefore, we used more stable cohesive subgroup method basing on accessibility and connectivity among its members to detect a collection of key nodes.
Rencently years, the HIV incidence in Chinese younger MSM was signi cantly higher than that in older MSM [34]. However, based on our results, young MSM were less likely to promote the wide spread of HIV than older MSM. The point of intervention activities should be to improve the self-protection awareness of young MSM. On the other hand, the result of a Large-Scale Systematic Analysis in China showed that HIV prevalence was the highest in those aged 50 years and older among MSM [35]. Accordingly, the focus of intervention should be to promote HIV testing and antiretroviral therapy in older MSM.
At rst sight, it appears easy to identify cohesive subgroups in the network just by looking at the visualizable network. However, it must be a problem for some group members or subgroups would be missed by simply visualizing it if the number of members in the network is too large.
The main limitation of this study was the small sample size. The network used to analyze structure characteristics in this paper was partial network, so that the number and scale of the subgroups may be underestimated, and some key nodes were not cucessfully identi ed. Because of the limit sample size, we did not get enough information about key nodes except age. Large sample size research is needed to explore the demographic and behavioral characteristics of key nodes. Besides, HIV molecular transmission network can't completely represent social network. However, molecular procedures can  Figure 1 Network diagram of 75 nodes who at least had one relationship tie with another among 184 sequences