Bayesian Hierarchical Network Autocorrelation Models for Estimating Direct and Indirect Effects of Peer Hospitals on Outcomes of Hospitalized Patients

When an hypothesized peer effect (also termed social influence or contagion) is believed to act between units (e.g., hospitals) above the level at which data is observed (e.g., patients), a network autocorrelation model may be embedded within a hierarchical data structure thereby formulating the peer effect as a dependency between latent variables. In such a situation, a patient’s own hospital can be thought of as a mediator between the effects of peer hospitals and their outcome. However, as in mediation analyses, there may be interest in allowing the effects of peer units to directly impact patients of other units. To accommodate these possibilities, we develop two hierarchical network autocorrelation models that allow for direct and indirect peer effect pathways between hospitals when modeling individual outcomes of the patients cared for at the hospitals. A Bayesian approach is used for model estimation while a simulation study is used to assess the performance of the models and sensitivity of results to different prior distributions. We construct a United States New England region patient-sharing hospital network and apply our Bayesian hierarchical models to study the diffusion of robotic surgery and hospital peer effects in patient outcomes using a cohort of United States Medicare beneficiaries in 2016 and 2017. The comparative fit of models to the data is assessed using Deviance information criteria tailored to hierarchical models that include peer effects as latent variables.


Introduction
The network autocorrelation model (NAM) involves the study of relationships among social units and their interdependent behaviors (O'Malley and Marsden, 2008).For instance, as described in Doreian (1980) and Friedkin (1990), a classic linear NAM is specified as: Y = ρW Y + Xβ + ε, ε ∼ N 0, σ 2 I , where Y is a vector of outcomes, W is a matrix whose elements represent social ties between actors, X is a matrix of covariates, ε is the error term and ρ quantifies the direct peer effect between subjects.Doreian (1980) also describes an alternative model in which the error term rather than the outcome variable is interdependent: Y = Xβ + ε, with ε = ρW ε + ϑ.This model is also well known as a simultaneously autoregressive (SAR) model.Furthermore, Friedkin (1990) introduces a model which includes both interdependent exogenous and endogenous forms of social influence: Y = ρ 1 W 1 Y + ρ 2 W 2 Z + Xβ + ε, where Z is a column vector for an exogenous variable.However, in the current literature on social network analysis, none of these models examined the interdependence among the actors when they are at a higher level in a hierarchical data structure than the units on which observations are made.Such a scenario may arise in a study of peer effects among hospitals in which the goal is to determine whether patient outcomes are directly impacted by peer hospitals.Meeting such a goal is important as knowing whether a hospital's adoption of a technology impacts the outcomes of a greater population of patients than just their own patients is important for policy-makers to know in order to make decisions regarding the priority of different incentive programs aiming to improve the quality of patient care and outcomes.
In Chen and O'Malley (2024), we delved into the research conducted by Dong and Harris (2015) on hierarchical spatial autoregressive models (HSAR) that account for hierarchical spatial data structures involving geographic units.Compared to network data, spatial data typically has a simpler typology in which distances between points or areas are compliant with the triangle inequality.Additionally, in this framework, the peers of geographic units can only exert indirect influence on individuals in the focal geographic unit through spatial connections at the geographic unit level as opposed to directly impacting individuals in the focal geographic unit.
In this paper, we first develop the basic hierarchical network autocorrelation model by adapting the HSAR in Dong and Harris (2015) to social network data assuming the peer effects of actors (e.g., hospitals) higher in the hierarchical structure than the level at which observations are made (e.g., patients).Second, we develop a novel extended hierarchical network autocorrelation model that includes an extra parameter to allow direct inter-level influence of hospitals on patients of other hospitals.This extended model relaxes the "no direct effect" restriction of the basic HSAR model in which peer hospitals may indirectly impact the patients from the focal hospital through their impact on the focal hospital (i.e., an indirect effect of peer hospitals) but does not allow peer hospitals to directly impact patients of the focal hospital.The basic HSAR model can be considered a special case of the extended model in which direct impact is not allowed.We study the mean and variance of the observation-level outcomes as a function of these two network autocorrelation parameters to gain insights into the mechanisms that they represent.The adaptation of HSAR to social network data has not been studied in the literature to date while the extension of the model to allow for direct (across-level spillover) effects is an entirely new topic.
Due to the complexity of the hierarchical network structure, we complete a Bayesian specification of the model and use Bayesian computational methods to fit each of our hierarchical network autocorrelation models.A series of simulation studies quantifies the properties and demonstrates the performance of Bayesian posterior median estimators of the model parameters including the autocorrelation parameters under different prior distributions; the sensitivity of posterior inferences to the prior distribution assumed for the focal peer effect parameter ρ is of particular interest.To alleviate concerns with commonly-used uniform priors for ρ, we also develop a new prior that imposes uniformity on a natural transformation of ρ.
Our study has three main methodological contributions.First, we develop two hierarchical network autocorrelation models assuming the peer effects of actors operate at a higher level of the model than the observation level.Further, we allow for direct and indirect across-level peer effects of actors.Third, to explore and interpret the peer effects, we assess the functional dependence of the marginal mean and variance of the outcomes on the network parameters, density and covariates, compare the two models in terms of model fit and results for the robotic surgery application, develop novel priors for the model parameters, including for a transformed uniform prior distribution for ρ designed to be both a non-informative prior but also to aid model estimation, and explore the sensitivity of posterior inferences to the prior distribution for ρ.This paper extends our paper (Chen and O'Malley, 2024) published in the Proceedings of the Complex Networks 2023 Conference.We further explore the impact of higher level covariates of both the focal actor and the peer actors on observation level outcomes in Section 2, thereby extending our models to inter-level spillover effect models; demonstrate our Bayesian estimation approach in greater detail with the development and derivation of prior distributions for the model parameters and the corresponding posteriors in Section 3; assess the performance of our Bayesian estimation approach using a weighted network whose distribution of edge weights is matched to that in our empirical robotic surgery example using simulations and compare the ensuing findings to those under the binary-valued network in Section 4; and describe the construction of the patient-sharing hospital network used in the robotic surgery example in Section 5.
Our motivating example is an observational study with the objective of understanding the full impact of the adoption of robotic surgery on the time to discharge from hospital of patients undergoing prostatectomy surgery.Robotic surgery, as a robotically-assisted and minimally-invasive procedure, is commonly used in prostatectomy for prostate cancer and also assists in the treatment of lung cancer, kidney cancer and colorectal cancer (Lee, 2009;Chandra et al, 2015;Mirnezami et al, 2010;Novellis et al, 2017).Several advantages of robotic surgery such as shorter hospital stays, less pain, and lower risk of infection has been discussed (Barbash and Glied, 2010).Using the 2016 United States (US) fee-for-service Medicare claims data, we construct a US New England region hospital network for patients with prostate, lung, kidney and colorectal cancer.We study the peer effects among hospitals on prostatectomy time to discharge post-surgery of US Medicare patients in 2017 to allow a lagged peer effect of network interdependency and to partially protect inferences against reverse causality.
The findings from applying our models to the robotic surgery network and hospital attribute data will potentially assist policy-makers wanting to provide incentives to hospitals to adopt new medical technologies that are beneficial to patients.In particular, the novel extended hierarchical network autocorrelation model will provide insights into whether a hospital's adoption of technologies generally benefits patients in a local area (e.g., by improving general standards of surgical quality including strengthening infection control measures in emergency rooms) such that patients who receive surgeries at other hospitals also benefit.

Notation and models
Throughout this paper, the term "ego" refers to the focal actor being studied while the term "alter" refers to the actors connected to the ego in a network, also referred to as "peers".

Hierarchical network autocorrelation model
In our adaptation of the HSAR in Dong and Harris (2015), we first assume that peereffects only act on individual subjects generating observations through their impact on the cluster-effects of the units (network actors) in which the individuals are grouped, such as in the following model: where ε ∼ N 0, σ 2 I N , τ ∼ N 0, ω 2 I g , Y is a vector of length N containing the values of a response variable for N observations, Z is a N × k matrix for k observation level covariates whose first column is a vector of 1 s corresponding to the intercept parameter, X is a g × l matrix for l cluster level covariates, δ is a vector of length g representing the random effect of network actors and B is a N × g matrix linking the random effect δ back to Y .In addition, ε and τ represent the errors at the observational and cluster levels and W is a g × g matrix quantifying the relationships between the actors in the associated network.The ij th entry of W , W ij , represents the influence of actor j on actor i.
The matrix W is constrained to be a non-negative row-normalized matrix, reflecting the non-existence of negative influences and that relative exposures are the conduit through which social influence transmits.The diagonal of W consists of zeros as self-ties are not permitted in the network.The focal parameter ρ is the peer effect corresponding to the indirect effect of alters on the outcomes of individuals in the role of the ego.
Letting A = I g −ρW , to ensure A is non-singular and the determinant of A, |A| ̸ = 0, the range of ρ needs to be restricted.Following Anselin (1988) and LeSage (2000), we restrict the parameter space of ρ to (1/λ min , 1/λ max ), where λ max and λ min are the maximum and minimum eigenvalues of the row-normalized W .For a row-normalized W , 1/λ max = 1 and 1/λ min ≤ −1 (Stewart, 2009) with the value of 1/λ min becoming more negative with increasing network density.Network density equals M/(g(g − 1)) for directed networks and 2M/(g(g − 1)) for undirected networks, where M is the observed number of ties and g(g − 1) is the number of possible directed ties.
We compute the marginal mean and variance of Y to explore and interpret the peer effect.If A is non-singular, the marginal mean and variance satisfy: Applying the Neumann series, when the norm |ρW | < 1 it follows that: The above demonstrates that both the marginal mean and variance of the model depend on ρ and that A −1 is an infinite-order polynomial function of ρ and W .

Extended hierarchical network autocorrelation model
A restriction on the model in ( 1) is that conditional on δ i there is no direct dependence between the vector of observations Y j and Y i for actors j and i, respectively, for any j ̸ = i.In practice, one could imagine situations in which alters may directly influence the individuals associated with the ego in ways other than through their impact on the ego.Such situations may arise when the observational level network of primary interest is unmeasured.For example, the patients of one hospital may benefit from improved quality of care at a peer hospital through the patients of the peer hospital, incentivising better health behaviors in the patients of another hospital, or that peer hospitals directly impact the patients of a hospital by sharing resources.To allow for this possibility, we introduce an extended hierarchical network autocorrelation model with an extra parameter quantifying direct across-level influence of hospitals on patients of other hospitals: where ε ∼ N 0, σ 2 I N , τ ∼ N 0, ω 2 I g , and α is an unrestricted parameter that quantifies the direct network effect of alters on the outcome of individuals from the ego.The parameters ρ and α have an analogy to indirect and direct effects in a mediation analysis; the indirect effect of peer hospitals outcomes on the outcome of a focal hospital through their impact on that hospital's performance is quantified by ρ while the direct effect that acts independently of the focal hospital is quantified by α.
The matrices W 1 and W 2 in (2) could represent different types of relationships between actors; e.g., W 1 could be built on geographic distances between hospitals while W 2 could be built on patient-sharing information between hospitals.With only a single source of network relationship information, in our study we set we compute the marginal mean and variance of Y under (2): To help interpret α and ρ and differentiate the behavior and properties of the model in (2) from those of the model in (1), as described in Section 2.3 we numerically evaluated these expressions across a range of values of α and ρ and visualized the results.

Illustration of marginal mean and variance of extended model with simulated data
To gain insight into the effects captured by ρ and α under the model in (2), we simulated 100 datasets under this model assuming a network containing 50 hospitals and 30 individuals per hospital (model ( 1) is a special case of model ( 2) where α = 0 making it sufficient to only consider model ( 2 ).Similarly, we evaluate the relationship between α and the marginal mean and variance of the model (Figure 1.c.and 1.d).Finally, we investigate the association between the network density d and the marginal mean and variance of the model (Figure 1.e.and 1.f.). Figure 1.a.and 1.b.show that the magnitude of the marginal average mean and variance of Y increases when the value of ρ increases and accelerates exponentially upwards when ρ approaches its upper bound of 1.When ρ approaches 1, the determinant of A approaches zero and the entries of A −1 become increasingly large leading to extreme exponential behavior.Similar results are found for negative values of ρ. f., we find that the marginal mean and variance display volatile behavior when the network density is smaller than 0.1.When the network density is small, e.g., d < 0.1, the simulated network often contains isolated nodes.To overcome computational issues in matrix row-normalization that occur with isolated nodes, our specification of W assumes that isolates are equally influenced by all other actors.Accordingly, the volatile behavior of the marginal mean and variance of Y as density approaches 0 is due to the rapid escalation in the prevalence of isolates.
A special case of our models is the spillover effects model in which an individual's outcome is influenced by their alters' covariates.Therefore, to illustrate how the ego and alter covariates associate with the outcome, we compute and plot the change in the expected value of mean of E (Y .j ) for subject j under (i) a 1 unit increase in a covariate X j (i.e., the change X j → X j + 1) and (ii) a 1 unit increase in the mean of subject j's alters covariates X −j (i.e., the change X −j → X −j + 1).The corresponding changes of the jth ego's mean are , respectively.Analogous simulations under the two scenarios are conducted with respect to ρ and α (see Figure 2.a. and 2.b. for ρ and α, respectively, when X j → X j +1 and likewise Figure 2.c. and 2.d.when X −j → X −j +1).For illustration we consider the case when j = 1, the individuals within hospital 1.
Figure 2.a.shows that the average value of E (Y .1 ) due to a 1 unit increase in the ego's covariate (i.e., X 1 → X 1 + 1) increases with increasing values of ρ and resembles an exponential trend as ρ approaches its upper bound.Similarly, in Figure 2.c., the average values of the mean of E (Y .1 ) due to a 1 unit change in the weighted average of the alters' covariates (i.e., X −1 → X −1 + 1) increases with increasing values of ρ and follows an exponential trend as ρ approaches its upper boundary.Although these two figures depict similar exponential behavior, the magnitude of the changes differ as evinced by the differing scales on the vertical axes.Exponential behavior occurs because the determinant of A is close to zero and the elements of A −1 become increasingly large when ρ approaches its upper bound.Additionally, in Figure 2.b. and 2.d., we find that the change of average values of mean of E (Y .1 ) due to the ego's own covariate values and that of their peers increases are linear with increasing α but differ in the scale of the changes.We adapt estimation approaches for linear NAMs to our hierarchical NAMs.A simulation study of Dittrich et al (2017) shows that Bayesian modeling and estimation outperforms maximum likelihood estimation with respect to bias and the level and width of the credible intervals.In our analysis, we use a Bayesian approach to estimate the models due to its flexibility and reliability.
A Bayesian analysis relies on specification of a prior distribution, a likelihood function, and the derivation of a posterior distribution.The prior distribution, p(ϕ) contains the prior information or beliefs for the model parameters ϕ.The likelihood function f (y | ϕ) summarizes the information in the data.By Bayes rule, the posterior distribution satisfies: p(ϕ | y) ∝ p(y | ϕ) p(ϕ).Bayesian point and interval estimates may be derived from the posterior distribution using direct Monte Carlo (or closedform) probability evaluations.

Prior distributions for Bayesian modelling
Despite the importance of prior distributions in Bayesian analysis, there has been little study of the properties of different prior distributions for ρ within the hierarchical network autocorrelation model framework.
As discussed in Chen and O'Malley (2024), we propose 3 prior distributions for ρ with different ranges and shapes to investigate the sensitivity of the posterior distribution to the prior for ρ.These priors include: 1) A uniform prior p(ρ) ∝ 1 over the range 1/λ min < ρ < 1/λ max (as previously discussed, 1/λ max = 1 and 1/λ min becomes much smaller than −1 when network density increases).2) A uniform prior for ρ with support (−1, 1), a symmetric and more restricted parameter space that matches that of correlation coefficients such as the Pearson and Spearman's rank correlation coefficients.3) As discussed in Chen and O'Malley (2024), an improper uniform prior on a parameter equal to the following transformation of ρ to the entire real line: The above expression corresponds to the generalized logit function.Therefore, the implied prior for ρ is given by: having positive support for ρ ∈ (1/λ min , 1/λ max ).Our transformed uniform prior emulates Jeffery's prior applied to classic linear NAMs (Dittrich et al, 2017).Additionally, the informative and weakly informative normal prior for ρ discussed in Dittrich et al (2017) for classic linear NAMs may also be considered in our models, i.e., p(ρ) ∼ N 0.36, 0.19 2 and p(ρ) ∼ N 0.36, 0.7 2 .However, we focus on noninformative priors for ρ, with the objective of identifying priors that could enable stable computation of statistical inferences while introducing minimal information into the analysis.
For further illustration and comparisons, we use the network constructed in our motivating robotic surgery example with 1/λ max = 1 and 1/λ min = −1.660 to plot the density function of our proposed three priors for ρ.As shown in Figure 3, the transformed uniform prior is "U-shaped" and has more prior mass at its boundary values {1/λ min , 1/λ max } than under the two uniform priors of ρ.
To complete a Bayesian specification of the model, we assign improper flat priors on σ and ω; i.e., σ 2 , ω 2 ∝ 1/(σω).Alternatively, a half Cauchy prior can be placed on ω to take advantage of desirable properties it has for hierarchical models (Gelman, 2006).Because the development of a half Cauchy prior for ω in relation to NAMs has not been discussed in the literature, we present the derivation in the Appendix.The results using the half Cauchy prior are similar to those for the uniform prior on ω in our analysis.We specify the flat prior p(θ, β) ∝ 1 for (θ, β), although indistinguishable results are found from assigning normal priors centered at 0 with large variances ("noninformative normal priors") for θ and β.In addition, for the model in (2), we assign a flat prior p(α) ∝ 1 to α with no restriction on its range.
and the conditional prior distribution of δ as: 2ω 2 Due to the complexity of the model and the large number of parameters, the joint posterior distribution is non-standard and direct sampling from it is intractable.Therefore, we use a hybrid Gibbs-sampling Metropolis-Hastings algorithm that sequentially draws from the conditional posterior distribution of each parameter given the data and current values of all other parameters (Geman and Geman, 1984).
As seen in the probability distributions functions presented below, due to conjugacy the conditional posterior distributions for each of σ 2 , ω 2 , β, θ and δ have well-known closed-forms: making sampling from them straightforward.In contrast, the conditional posterior of ρ: does not have a form conducive for direct sampling.Therefore, we approximate (3) and use a Metropolis Hastings algorithm with an independent candidate generating function for direct sampling (Dittrich et al, 2017).As demonstrated in the derivation in the Appendix, the resulting candidate generating distribution of ρ when p(ρ) ∝ 1, 1/λ min < ρ < 1/λ max , is the truncated normal distribution (T N ): where λ i for i = 1, . . ., g are the eigenvalues of W .Under p(ρ) ∝ 1, −1 < ρ < 1, the implied candidate generating distribution of ρ is T N (µ * , V * ) with support −1 < ρ < 1.For the transformed uniform prior of ρ, we use the same candidate generating distribution in Equation (4) to sample ρ.
We use the same prior distributions and MCMC sampling procedure as for the model in (1) to fully specify and fit the model in (2).The conditional posterior of α is then: where K = Y − Zθ − Bδ.The derivation of the conditional posteriors of all other parameters of the model in (2) are presented in the Appendix.

Simulation study
As discussed in Chen and O'Malley (2024), we conducted a simulation study involving 1,500 hypothetical patients receiving care from 50 hospitals (30 patients per hospital) in a hospital network to evaluate the performance of the models in (1) and ( 2) under different priors for ρ.Specifically, we generated undirected binary-valued network matrices representing whether peer hospitals share patients or not with network density (d) = 0.2, 0.4, 0.6, 0.8 using the R package "sna".In each case, the resulting adjacency matrix was row-normalized to form the matrix W .We considered the following values of ρ = −0.5, −0.2, 0, 0.2, 0.5 and α = 2, 5.For each model, three patient-level covariates plus an intercept and three hospital-level covariates were included with the elements of the matrices Z and X consisting of random draws from the standard normal distribution.The true values of σ 2 and ω 2 were set to 1.For each scenario, we generated 500 simulated datasets.We used the posterior median as the Bayesian point estimator because the conditional posterior distributions of ρ and α tend to be skewed.Hence, the posterior median is a better measure of the center of the distribution than the posterior mean.The bias of the posterior median estimator of ρ was computed by evaluating bias ρ = 1 500 500 s=1 (ρ s − ρ true ), where s is the simulation counter, ρs is the posterior median of ρ in simulated dataset s, and ρ true is the value of ρ used in generating each of the 500 simulated datasets.We similarly calculated the frequentist mean squared error (MSE) of ρs as M SE ρ = 1 500 500 s=1 (ρ s − ρ true ) 2 .Finally, we computed the coverage rate of the 95% equal-tailed credible interval for ρ by evaluating coverage ρ = 1 500 500 s=1 I (ρ true ∈ CI ρs ), where I(e) is the indicator function equal to 1 if the event e is true and 0 otherwise, and CI ρs = (q ρ,s,0.025, q ρ,s,0.975 ) is the equaltailed 95% credible interval of ρ in simulated dataset s in which q ρ,s,1−κ is the 1 − κ quantile of the posterior distribution for ρ in simulated dataset s.We estimate CI ρs non-parametrically by extracting the 2.5% and 97.5% smallest to largest ordered values in the retained sample of 500 posterior draws of ρ in the analysis of dataset s.We used the same approach to compute the bias, MSE and coverage rate of α.
The simulations reveal that our Bayesian estimation approach performs well with respect to bias, MSE and the coverage rate of the 95% equal-tailed credible intervals of ρ and α (for detailed results, see the Supplementary document of the manuscript).The bias of ρ and α increase with increasing network density, consistent with the findings for classic linear NAMs in Mizruchi and Neuman (2008) and Dittrich et al (2017).By comparing the performance of the three different priors for ρ, we found that the range of the prior has more impact than its shape on the estimated values.In particular, due to the asymmetric support of ρ, 1/λ min < ρ < 1/λ max , the full-range uniform prior for ρ yields a posterior median estimator of ρ exhibiting an asymmetric bias pattern either side of 0. In contrast, bias is much more symmetric around 0 under the uniform (−1, 1) prior for ρ, especially when network density is large.For example, for model (1) with the uniform (1/λ min , 1/λ max ) prior for ρ, when d = 0.8, bias = 0.025 if ρ = −0.5 and bias = −0.298if ρ = 0.5.Under the uniform (−1, 1) prior for ρ, when d = 0.8 and ρ = −0.5 we obtain bias = 0.241 while if ρ = 0.5 then bias = −0.274.These results imply that as network density increases in a binary-valued network, it becomes more challenging for the model to identify ρ.Intuitively, as density increases the information in the data about ρ declines due to the vast number of connections in the binary valued network making the variation across the actors in the extent to which they are more or less connected with other actors much lower than when density is low.It is under this high density scenario that slight differences in the non-informative prior specification for ρ nontrivially impact the resulting posterior distribution.
We also conducted a simulation study to evaluate the performance of model ( 1) and (2) for the motivating example in Section 5 by building an undirected weighted edge network whose network density and distribution of edge weights are similar to the network in the motivating example (45 hospitals and 1,306 patients with network density = 0.779 and the mean and standard deviation of the weighted edge ≈ 226 and 664, respectively).In the simulated data sets, we set the number of hospitals to 50 with 30 individuals per hospital (the total number of individuals, N = 1, 500), and the network density to 0.8.We then generated edge-weights from the Gamma(0.1, 2000) distribution.Other settings were maintained as for the simulations in Chen and O'Malley (2024).The results for model ( 1) and ( 2) are shown in Table 1 and 2, respectively (because 1/λ min is close to −1, some results are identical for Unif 1 and Unif 2 prior).
Compared to the binary-valued network, we observed smaller bias and MSE of the estimators of ρ and α under the weighted network.In addition, we found bias is more symmetric around 0 under the uniform (1/λ min , 1/λ max ) prior.For example, in Table 2, using the uniform prior p(ρ) ∝ 1 over 1/λ min < ρ < 1/λ max , the bias of ρ is 0.005 when ρ = −0.2 and is −0.013 when ρ = 0.2, which emulate the results obtained when p(ρ) ∝ 1 over −1 < ρ < 1.This is because: 1) W based on the weighted edge network may contain more information about relationships between the actors in a network than the binary-valued network; 2) Under these simulation settings, 1/λ min is close to -1 and 1/λ max = 1; therefore, the interval support for the uniform prior (1/λ min , 1/λ max ) and the transformed uniform prior is more symmetric.

The impact on patient quality of hospitals' adoption of robotic surgery
Robotic surgery, with its advantages of shorter hospital stays, less pain and faster recovery, has been widely used on patients suffering from many health conditions, Table 2 Bias, mean squared error (MSE), and 95% coverage rates (Rate) of ρ using uniform priors (Unif 1 for 1/λ min < ρ < 1/λmax and Unif 2 for −1 < ρ < 1) and transformed uniform prior (T Unif) and assume an improper flat prior for α for model (2).particularly in the cure of prostate cancer, lung cancer, kidney cancer and colorectal cancer.In our study, we are interested in whether the extent to which peer hospital adoption of robotic surgery influences the patient's prostatectomy time to discharge post surgery at a hospital.We first construct a "New England region" of the six states in the Northeastern US (Maine, New Hampshire, Vermont, Massachusetts, Connecticut, and Rhode Island) patient sharing hospital network.As a substantial sub-region of the US, the New England region provides an adequate sample size for our analysis.Consequently, analyses focused on the New England region are considered to be based on a sufficient magnitude and richness of data to be informative and have a realistic chance of detecting effects of clinical significance.Following the approach introduced in Moen et al ( 2016) and O'Malley et al ( 2020), for each pair of physicians we compute the weighted edges between physicians by summing the geometric means of the number of visits the same patient made to each physician in the pair across all patients suffering one of these four cancer types.To clarify, let a ijl and a ikh denote the number of visits by patient i to physician j in hospital l and physician k in hospital h, respectively.The weighted edge between physician j in hospital l and k in hospital h across n shared-patients can then be represented as e jlkh = n i=1 √ a ijl a ikh .We use the method introduced in Bynum et al (2007) to assign physicians to hospitals and then compute weighted edges between hospitals by aggregating the physicians' edge weights over the physician dyads spanning each pair of hospitals.That is, we compute the weighted edge between hospital l and h as E lh = m jlkh=1 e jlkh with the summation over all m pairs of physicians bridging hospitals l and h.
We use an example to depict the computation of weighted edges between physicians and aggregated weighted edges between hospitals (Figure 4).As demonstrated above, the weighted edge between physicians is computed by summing the geometric means of a patient's number of visits to each physician in a dyad across all patients.In the example in Figure 4, the weighted edge between physician A and C is √ 2 × 4 + √ 3 × 6 = 7.071 and the weighted edge between physician B and C is √ 4 × 8 + √ 5 × 10 = 12.728.The weighted edge between hospitals is then computed by aggregating the edges over the physician dyads bridging each pair of hospitals.Because 2 pairs of physicians (A, C and B, C) span hospital 1 and 2, the weighted edge between hospital 1 and 2 is 7.071+12.728= 19.799.The resulting undirected weighted hospital network matrix is row normalized to form the W used in the application of models ( 1) and ( 2) to these data.We focus on a sub-group of hospitals in the network that are equipped with a robotic surgery system and that conducted more than 5 prostatectomies per year to study the association between the peer hospital adoption of robotic surgery and patients' prostatectomy time to discharge post-surgery.Medicare health insurance claims data from 2016 is used to build the patient-sharing hospital network and evaluate hospital covariates while the 2017 Medicare data is used to evaluate all other patient outcomes and covariates.In our analysis, we include patient's age, disability, whether receiving a robotic surgery and the Charlson Comorbidity Index (Charlson et al, 1994) as covariates.Because most patients ( 96%) have a Charlson Comorbidity Index of 0, we converted the Charlson Comorbidity index to a binary variable using 0 as the threshold; i.e., patients with comorbidity versus patients without comorbidity.For hospital-level covariates, we include the number of beds, percentage of robotic prostatectomy and number of peer hospitals in the network.For the outcome, patient's prostatectomy time to discharge post-surgery, we use a log plus 1 transformation of the data to reduce skewness.In addition, we standardize all continuous covariates to ensure estimates and operating characteristics of these variables are on the same scale.The cohort contains 45 hospitals and 1,306 patients with the network density d = 0.779 (1/λ max = 1 and 1/λ min = −1.660).
We report the posterior median estimators and 95% equal-tailed credible intervals of ρ, α and other parameters in Table 3.In addition, we compute the Deviance Information Criterion (DIC) (Spiegelhalter et al, 2002) for model comparison due to its data-determined evaluation of the effective degrees-of-freedom of the model to penalize our Bayesian hierarchical models for model complexity and thus guard against over-fitting when comparing the extended hierarchical network autocorrelation model (model ( 2)) to its base model counterpart in which α = 0 (model (1)).
Table 3 finds similar results for all parameters other than the peer effect parameters ρ and α between the two models.For instance, robotic surgery is associated with shorter hospital stays, while age, disability, and the Charlson Comorbidity index are associated with longer hospital stays.Comparing the two models, we observe a significant change in the estimated value of ρ.With the inclusion of α (α = 1.355 (-2.539, 4.168)), ρ changes from -0.048 to -0.525.The negative value of ρ indicates that peer hospitals' adoption of robotic surgery indirectly associates with shorter hospital stays of patients, whereas α = 1.355 > 0 suggests that peer hospitals' propensity to adopt robotic surgery is directly associated with longer patient hospital stays.The wide credible intervals for both ρ and α overlap 0, revealing that with a very large network density (i.e., 0.779), the information in the data about ρ and α is limited (much more so than if density were lower).The two models have very close DICs with the DIC of model (2) slightly smaller than the DIC for model (1), suggesting that model (2) fits the data better.

Discussion
In this paper, we developed two hierarchical network autocorrelation models to study the direct and indirect peer effects of actors at a higher level of the hierarchical data structure.The novel contributions include the exploration of both direct and indirect peer effects among higher-level actors and the impact of peer actor behavior on an observation-level outcome.In addition, we proposed a Bayesian approach for estimation and compared the performance of the resulting estimators under different prior distributions for model parameters, especially for ρ to gain insights into the sensitivity of the posterior distribution and associated inferences to the prior.
For model (2), we set W 1 = W 2 = W because only a single source of network relationship information is available in our dataset.However, W 1 and W 2 may be different matrices representing different types of connections between actors.This is a limitation of our study and the performance of our models under different types and specifications of W is of great interest for future research.While our focus was on continuous outcomes, an intuitive direction for further research is to generalize the hierarchical and extended hierarchical network autocorrelation models to non-continuous outcomes such as binary, count and rate outcomes.Although our simulation study confirmed that our models are estimable, further study of the relationship between network features and the precision of estimation of peer effects is warranted.
Our model and methodological development were applied to data from an observational study of the diffusion of robotic surgery on the quality of patient outcomes.Although our findings were inconclusive with the credible intervals of ρ and α overlapping 0, a consequence of the densely connected network resulting in their being a modest amount of information in the data about ρ and α, in general our models have the potential to be widely applied and to reveal important scientific findings regarding the impact of the adoption of a health technology by its peer hospitals on the outcomes of patients at the focal hospital, from which important policy recommendations may be derived.
where λ i (i = 1, . . ., g) are eigenvalues of W (Ord, 1975) and ln |A| = ln (1 − ρλ i ).By a second-order Taylor series expansion: where, T N (µ * , V * ) is the truncated normal distribution candidate generating distribution with mean µ * and variance V * : The acceptance ratio γ for a Metropolis-Hastings algorithm with an independence candidate generating function is: where ρ c is the current state of ρ and ρ p is the proposed state of ρ, is randomly generated from the candidate generating distribution with To perform a Metropolis-Hastings step we randomly draw a random variable u from U (0, 1) and accept the proposed state if u ≤ γ, otherwise rejecting the proposed state and staying at current state (i.e., if u > γ).
The conditional posterior of δ is:

Extended Hierarchical network autocorrelation model
The likelihood function of the model ( 2) is: We assign a flat prior for α, that is p(α) ∝ 1 and the prior distributions for the other parameters align with these in model (1).
In addition, the conditional posterior of η is: To sample ξ from its conditional posterior distribution, we used following candidate generating function: The conditional posterior of ρ is: As previously discussed, the candidate generate function of ρ is: In the above, we used model (1) to show the derivation of the half Cauchy prior for ω.A similar construction applies for (2).
) in the simulation).To determine how the marginal mean and variance of the model change with increasing ρ, we plot the average of the mean of the elements of E(Y ) and the average of the diagonal elements of var(Y ) over 100 drawn values on the vertical-axis against ρ on the horizontal-axis (Figure 1.a.and Figure 1.b.
Figure 1.c.reveals a linear decreasing association between the marginal average mean of Y and α while Figure 1.d.shows that the corresponding marginal variance increases with α.From Figure 1.e.and Figure 1.

Fig. 1
Fig.1The marginal mean and variance of the model along with the change of ρ, α and network density

Fig. 2
Fig. 2 The change of the marginal mean of the model following a 1 unit increase of ego's and alters' covariates across a range of values of ρ and α