Neural networks for estimation of facial palsy after vestibular schwannoma surgery

DOI: https://doi.org/10.21203/rs.3.rs-1619106/v1

Abstract

Purpose

Facial nerve damage in vestibular schwannoma surgery is associated with A-train patterns in free-running EMG, correlating with the degree of postoperative facial palsy. However, anatomy, preoperative functional status, tumor size and occurrence of A-trains clusters, i.e., sudden A-trains in most channels may further contribute. In the presented study, we examine neural networks to estimate postoperative facial function based on such features.

Methods

Data from 200 consecutive patients were used to train neural feed-forward networks (NN). Estimated and clinical postoperative House and Brackmann (HB) grades were compared. Different input sets were evaluated.

Results

Networks based on traintime, preoperative HB grade and tumor size achieved good estimation of postoperative HB grades (chi2 = 54.8), compared to using tumor size or mean traintime alone (chi2 = 30.6 and 31.9). Separate intermediate nerve or detection of A-train clusters did not improve performance. Removal of A-train cluster traintime improved results (chi2 = 54.8 vs. 51.3) in patients without separate intermediate nerve.

Conclusions

NN based on preoperative HB, traintime and tumor size provide good estimations of postoperative HB. The method is amenable to real-time implementation and supports integration of information from different sources. NN could enable multimodal facial nerve monitoring and improve postoperative outcomes.

Introduction

Intraoperative monitoring is applied in cerebello-pontine-angle (CPA) surgery to detect and avoid nerval damage. In surgery for vestibular schwannoma (VS), monitoring of free-running EMG in addition to facial motor evoked potentials (MEP) and direct nerve stimulation (DNS) support preservation of facial and vestibulocochlear function and consequently postoperative quality of life [1, 2]. Monitoring of free-running EMG examines continuous EMG activity recorded by needle electrodes in the facial muscles for specific pathological patterns, so-called “A-trains”. The overall quantity of A-trains (“traintime”) has been shown to correlate with the degree of postoperative facial palsy [3, 4]. The positive predictive value of the method when applying fixed risk thresholds is ~ 64%, which is comparable to the values published for MEP and DNS [1, 4].

A main factor limiting the predictive value of the method is the occurrence of false-positive cases. Respective patients show a high amount of A-train activity, however, do not suffer from severe deterioration of facial function [1, 5, 6]. In a previous study [6], we showed that such patients frequently show a specific anatomical trait, first described as a “split” facial nerve [7]. In these cases, the intermediate nerve takes a course in the CPA separate from the facial nerve and carries motor fibers targeting the facial muscles [6, 810]. Irritation or damage of the intermediate nerve then seems to provoke comparably large amounts of A-trains. Potentially due to the low functional importance of intermedius motor fibers, the pathological activity is frequently not accompanied by respective clinical deficits [6]. Unfortunately, morphology and frequency of “intermedius” A-trains are not significantly different from “facial” A-trains [11], which does not allow a direct differentiation of the two entities. Instead, so-called A-train “clusters”, i.e. A-trains occurring in most of the recording channels within a short time seem to be more frequent in patients with a separate intermediate nerve [11]. As this difference is mostly visible on a group level, spatial distribution is also not useful to differentiate “false-positive” intermedius A-trains from A-trains that are correlated with postoperative palsy in an individual patient.

The observation of a separate intermediate nerve seems to be dependent on tumor size [11]. While patients with small tumors (Koos 1) rarely show a separate intermedius, the frequency increases with larger tumors (Koos 2 and 3), potentially due to chronic mechanical interaction of the tumor with the facialis-intermedius bundle. In patients with very large tumors (Koos 4) the frequency decreases again, potentially due to thinning of the already small nerve, which would make observation considerably more difficult.

These findings suggest a complex network of interactions between tumor size, intermediate nerve course, surgical manipulation and their impact on amount and distribution of A-train activity and ultimately the correlation to facial nerve outcome. It seems therefore unsurprising that application of fixed risk traintime thresholds largely independent of tumor size and necessarily without taking the presence of a separate intermediate nerve into account suffers from limitations. Even in patients without a separate intermedius nerve, complex interactions between tumor size, preoperative deficits of facial nerve function and intraoperative findings may impact estimation of postoperative outcome.

In the current study, we evaluate whether more complex analytical approaches can integrate the range of pre-and intraoperative information to accurately estimate postoperative facial nerve function. Potentially, such procedures could contribute to overcome the issue of false-positives in the presence of a separate intermediate nerve and yield projected HB-grades rather than a risk for low or high-grade palsy [4].

We employ machine learning and specifically neural networks to calculate an outcome parameter similar to House-Brackmann (HB) grades [12] based on traintime, tumor size and preoperative functional status. Neural networks are trained rather than designed using separate training and validation datasets. Some of their advantages relevant to our application is the ability to integrate different data types and to capture complex, non-linear interactions. While understanding the performance of a successful neural network is notoriously difficult, even a pure black-box approaches may have clinical merit if it outperforms estimation based on direct interpretation of traintime or tumor size alone.

The main goal of our study is therefore to provide an improved tool to estimate postoperative outcome with the potential for real-time intraoperative application for facial nerve monitoring.

Methods

Patients

Data from 200 consecutive adult patients who had undergone surgery for vestibular schwannoma between 7/2006 and 8/2016 were selected retrospectively and were anonymized. This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the University Hospital Halle (Saale) (Ref. Number 2018 − 138). All patients of whom data were included and anonymized in the study had given their written informed consent for usage of their data in scientific studies. Inclusion criteria were first surgery for VS, availability of complete continuous intraoperative EMG recordings from clinical routine as well as facial nerve outcome data from follow-up after at least 6 months. Exclusion criteria were previous irradiation and neurofibromatosis. Mean age was 51 years and ranged from 21 to 80 years. 109 patients were women. Tumor size was Koos 1 in 18 patients, Koos 2 in 57, Koos 3 in 70 and Koos 4 in 55 patients [13]. Preoperative facial nerve function was House-Brackmann (HB) grade 1 on median (range 1–3, 3 patients with HB 3) [12]. A separate intermediate nerve was observed intraoperatively in 99 patients.

Recordings

Continuous EMG was recorded during the complete surgical procedure as described previously [3, 4]. In short, 15mm long non-insulated needle electrodes were placed parallel in the facial muscles with an interelectrode distance of 5mm. For each of the 3 main branches of the facial nerve 4 electrodes were positioned on the operated side. By referencing neighboring electrodes, the setup yielded 3 bipolar channels per branch. The ground electrode was placed in the contralateral upper arm. Data were recorded with a Grass-Telefactor 15LT biosignal amplifier (West Warwick, RI, USA) with approximately 7kHz and using a 5Hz high pass filter.

EMG processing

Recorded 9-channel data was evaluated postoperatively by computer-assisted visual inspection using in-house software. Extending automated marking [3], on- and offsets of individual A-train patterns were marked. In addition, A-train clusters, defined as A-trains occurring in the majority of recordings channels within the same time segment of a few seconds [11] were identified visually. Subsequently, the durations of all A-train events were summed up per channel, yielding a total of 9 traintime values for each patient.

Clinical data

Clinical data were extracted from clinical documentation and included preoperative and immediate postoperative facial nerve function as well as follow-up after 6 months, graded according to House-Brackmann [12]. HB degrees were checked and corrected if necessary, by a single experienced evaluator (author JP) to reduce issues of limited interrater reliability of HB grading [14]. Intraoperative observation of a separate intermediate nerve was taken from the surgeon’s documentation.

Relationship to postoperative outcome

Relationship of traintime, tumor size and output of neural networks with postoperative outcome was evaluated using partial correlation as applied previously [4]. Due to the ordinal scaling of HB grades, Spearman rank correlations were used. Partial correlations allow to quantify an association between two parameters while controlling for one or more covariates. A statistically significant partial correlation suggests an association which is not explained by the covariates. For example, in a previous study [4], traintime showed a significant partial correlation with postoperative outcome controlled for tumor size, which suggests that traintime is associated with outcome not primarily depending on tumor size. In the context of the diagnostic value of the evaluated neural networks regarding postoperative facial nerve function, a significant partial correlation while controlling for both raw traintime and tumor size would suggest complementary information in comparison to traintime or tumor size alone.

Construction and evaluation of neural networks

Feed-forward networks with different input parameters, a single hidden layer and simultaneous postoperative and follow-up HB grades as outputs were constructed using the feedforward function of the Matlab Deep Learning Toolbox (Matlab R2021a, The Mathworks, Natick, MA, USA). The number of neurons of the hidden layer were chosen according to the number of inputs, e.g. 11 for 11 input parameters (9 channels of traintime, Koos tumor size and preoperative HB grade).

The procedure utilized a Levenberg-Marquardt training function and mean squared error (MSE) for performance evaluation. Available data was randomly separated into a 75% training and a 25% validation split, i.e., 150 randomly selected datasets served to train the network and 50 datasets were used to evaluate the performance. After the finalized training, resulting performance was evaluated in only the validation split by calculating chi2 statistic between network output, i.e., estimated HB grades and postoperative and follow-up HB grades. For more intuitive interpretation chi2 values were transformed into Cramér’s V effect sizes. For 5x5 tables (for estimated and clinical HB ranging from 1 to 5), values below 0.05 are considered negligible, 0.05–0.13 small, 0.13–0.22 medium and above 0.22 as large [15].

Statistical evaluation of network performance

Training and consequently performance of neural networks depend on the random choice of training and validation splits as well as random initialization of synapse weights between layers. To better estimate the overall performance of neural networks, we applied a bootstrapping technique to sample the distribution of performance observed with many networks. The approach repeated a single run of calculations 1000 times, yielding 1000 performance estimates, i.e., chi2 values of the comparison between network output and postoperative/follow-up outcome. Each run randomly designated 150 datasets to the training and 50 datasets to the validation split. A neural network was then constructed with the training split. Chi2 values were then calculated using only the validation split.

The mean and 95% confidence intervals of the resulting distribution was taken as overall performance. For calculation of significance, the distribution was compared to a surrogate distribution using a Komolgorov-Sminorv (KS) test. The surrogate distribution was constructed by shuffling input data of the validation in respect to the outcome values. Chi2 values were then calculated using the network output using the surrogate data. The procedure was also repeated 1000 times yielding the surrogate distribution. The underlying hypothesis of this procedure is that the ordered inputs should predict the outcome in the validation split better than the randomly shuffled surrogate.

Comparison of different input sets

The primary endpoint of our study was to evaluate neural networks with inputs traintime, tumor size and preoperative facial nerve function. Additionally, we evaluated performance, when adding the information that a separate intermedius and/or A-train clusters were observed. Performance differences are discussed based on 95% confidence intervals (CI). Overlapping CI were interpreted as a lack of significant differences, which is considered conservative [16].

Evaluation of tumor size

The networks trained on traintime, tumor size and preoperative facial nerve function were further analyzed to study the influence of tumor size. To this end, the complete dataset was subdivided into groups according to the Koos tumor size. Chi2 values from the comparison of outcomes with estimates of the previously trained networks were then calculated for each group individually. The rationale of this step was that performance within each group cannot depend on only the tumor size. Due to comparable preoperative HB grades in most patients and therefore also within the tumor size subgroups, the observed correlations then necessarily must depend on traintime. Mean correlations and 95%-confidence intervals (CI) are reported over all 1000 randomizations. For evaluation of differences between tumor size categories, a general linear regression model (GLM) was fitted to the network estimates, taking Koos tumor size and sample size in the groups into account. This approach was used to control for the rather different patient numbers in the tumor size groups, ranging from 18 with Koos 1 to 70 with Koos 3.

Influence of a separate intermedius nerve

Performance of neural networks was investigated regarding the influence of a separate intermedius nerve. To this end, estimates of networks were categorized into groups according to intraoperative observation of a separate intermedius nerve or lack thereof. Based on all 200 patients, chi2 concordance statistics with clinical HB grades were calculated for each group and each of the 1000 randomizations. Differences were again statistically evaluated with the KS test. We decided not to perform this evaluation in only the validation split unlike the remaining analysis but in the complete sample. Due to the random selection of 50 cases in each randomization, this would have led to varying and frequently unbalanced percentages of cases with a separate intermedius nerve. The main goal of this analysis step was to evaluate whether the impact of a separate intermedius nerve could be compensated by the best method or whether such an influence would still be present. Since chi2 statistics and to some degree Cramér’s V are sensitive to the sample size, comparison to performance of other neural networks evaluated in only the smaller validation split is limited.

Results

A summary of the results of conventional analysis is provided in table 1. Results of neural network analysis are summarized in Table 2, further details on the impact of a separate intermedius nerve are given in Table 3.

Table 1 Correlations with postoperative and follow-up HB of conventional analysis. Significant correlations are printed in bold.

Group

Parameters

HB

Spearman 
 correlation

r

p

All patients

Mean traintime

Postop.

0.397

<0.0001

 

 

Follow-up

0.323

<0.0001

 

 

 

 

 

 

Mean traintime without clusters

Postop.

0.442

<0.0001

 

 

Follow-up

0.350

<0.0001

 

 

 

 

 

 

Tumor size (Koos)

Postop.

0.456

<0.0001

 

 

Follow-up

0.437

<0.0001

 

 

 

 

 

 

Mean traintime, tumor size controlled (partial correlation)

Postop.

0.208

<0.005

 

Follow-up

0.123

0.084

 

 

 

 

 

 

 

 

 

 

Without sep. intermedius nerve

Mean traintime

Postop.

0.564

<0.0001

 

 

Follow-up

0.511

<0.0001

 

 

 

 

 

 

Tumor size (Koos)

Postop.

0.564

<0.0001

 

 

Follow-up

0.509

<0.0001

 

 

 

 

 

With sep. intermedius nerve

Mean traintime

Postop.

0.198

0.0499

 

 

Follow-up

0.08

>0.1

 

 

 

 

 

 

Tumor size (Koos)

Postop.

0.317

<0.0001

 

 

Follow-up

0.341

<0.0001







 
Table 2

Group

Parameters

HB

Spearman

correlation

r

p

All patients

Mean traintime

Postop.

0.397

< 0.0001

   

Follow-up

0.323

< 0.0001

 

Mean traintime without clusters

Postop.

0.442

< 0.0001

   

Follow-up

0.350

< 0.0001

 

Tumor size (Koos)

Postop.

0.456

< 0.0001

   

Follow-up

0.437

< 0.0001

 

Mean traintime, tumor size controlled (partial correlation)

Postop.

0.208

< 0.005

 

Follow-up

0.123

0.084

Without sep. intermedius nerve

Mean traintime

Postop.

0.564

< 0.0001

   

Follow-up

0.511

< 0.0001

 

Tumor size (Koos)

Postop.

0.564

< 0.0001

   

Follow-up

0.509

< 0.0001

With sep. intermedius nerve

Mean traintime

Postop.

0.198

0.0499

   

Follow-up

0.08

> 0.1

 

Tumor size (Koos)

Postop.

0.317

< 0.0001

   

Follow-up

0.341

< 0.0001

Performance of neural network estimates. Results in the validation split (50 patients) are reported.

 
Table 3

Inputs

Chi2

 

Cramér’s V

mean

CI

 

Koos only

30.6

30.1–31.1

0.28

Mean traintime only

31.9

30.6–33.2

0.28

Mean traintime only (without clusters)

47.7

45.9–49.5

0.35

Traintime, Koos, preOP HB

51.3

49.7–53.0

0.36

+ sep. intermedius

49.1

47.6–50.7

0.35

+ A-train cluster

49.7

48.1–51.3

0.35

+ sep. intermedius and A-train cluster

44.8

43.4–46.2

0.33

Traintime (without clusters), Koos, preOP HB

54.8

53.0-56.7

0.37

+ sep. intermedius

51.6

50.0-53.2

0.36

+ A-train cluster

52.1

50.4–53.8

0.36

+ sep. intermedius and A-train cluster

49.0

47.4–50.7

0.35

Comparison of neural network performance in patients with and without a separate intermedius nerve. Results in the validation split are reported, grouped according to intraoperative observation of a separate intermedius nerve. Due to the lower sample number in each group (on average approx. 50% due to the portion of patients with separate intermedius nerve), chi2 and Cramér´s V are generally lower compared to table 1.

Conventional analysis

In the complete dataset, traintime was significantly correlated to postoperative (r = 0.397, p < 0.0001) and follow-up HB (r = 0.323, p < 0.0001). In patients without separate intermedius nerve (n = 101), correlations were higher (r = 0.564 and r = 0.511, p < 0.0001). Correspondingly, correlations were low in patients with separate intermediate nerve (r = 0.198, p = 0.0499 and r = 0.08, p > 0.1). A-train clusters were more frequently observed in patients with a separate intermedius. However, manual removal of A-train clusters resulted in an only negligible improvement in the complete group: r = 0.442 and r = 0.350, p < 0.0001.

Koos tumor sizes also correlated with postoperative outcomes (r = 0.456, p < 0.0001, follow-up r = 0.437, p < 0.0001). Patients with a separate intermediate nerve had larger tumors than patients without (p = 0.0012, chi2 = 15.96, chi-square test).

In patients with a separate intermediate nerve, correlations of tumor size with facial nerve function were lower (r = 0.317 and r = 0.341, p < 0.0001) compared to cases without (r = 0.564 and r = 0.509, p < 0.0001).

Controlling for tumor size, partial correlations yielded significant remaining correlations for immediate postoperative traintime and facial nerve function (r = 0.208, p < 0.005), but those were not significant at follow-up (r = 0.123, p = 0.084).

Neural networks

Using traintime, tumor size and preoperative HB grades as input, mean chi2 over 1000 randomizations from the comparison between neural network estimates and postoperative outcomes was chi2 = 51.3 (p < 0.0001) corresponding to a Cramér’s V of 0.36 evaluated only in the validation split (n = 50 patients, 2 outcomes: immediate postoperative and follow-up, table 1).

Using the observation of a separate intermedius nerve or the occurrence of A-train clusters as an additional input yielded comparable results. Separate intermedius nerve: chi2 = 49.1 (p < 0.0001), Cramér’s V = 0.35. A-train clusters: chi2 = 49.7 (p < 0.0001), Cramér’s V = 0.35. Both: chi2 = 44.8 (p < 0.0001), Cramér’s V = 0.33. Performance using only tumor size or only mean traintime over all channels yielded considerably lower results: chi2 = 30.6 and 31.9 (p < 0.0001, Table 2).

All input combinations were reevaluated using traintime with manually removed A-train clusters (“corrected traintime”). This resulted in a considerable improvement even when only mean traintime over all channels was used as input: chi2 = 47.7 (p < 0.0001), Cramér`s V = 0.35 (Table 2). Networks with tumor size, preoperative HB and such corrected traintime values also resulted in a mean higher chi2 value of 54.8 (p < 0.0001), Cramér’s V = 0.37. Separate intermedius yielded chi2 = 51.6 (p < 0.0001), Cramér’s V = 0.36, A-trains clusters chi2 = 52.1 (p < 0.0001), Cramér’s V = 0.36 and both chi2 = 49.0 (p < 0.0001), Cramér’s V = 0.35 (Table 2).

Tumor size

Analysis of concordance with postoperative facial nerve function in Koos subgroups yielded chi2 = 4.8 (CI 4.7-5.0), Cramér’s V = 0.08 in the Koos 1 group, chi2 = 35.3 (34.5–36.0), Cramér’s V = 0.21 in Koos 2, chi2 = 118.6 (115.2-122.1), Cramér’s V = 0.39 in Koos 3 and chi2 = 61.3 (60.1–62.5), Cramér’s V = 0.28 in the Koos 4 group. Differences of chi2 values between groups reached statistical significance, also after correcting for the expected tumor and sample size interaction (GLM analysis, F = 2380, p < 0.0001 for the regression model, t = -40.8, p < 0.0001 for factor tumor size).

Influence of a separate intermedius nerve

Comparison of performance in all 200 patients yielded significantly better values in patients without a separate intermediate nerve using the best set of inputs (preoperative HB, tumor size and corrected traintime): chi2 = 164.2 vs. 65.9 (p < 0.0001), corresponding to a Cramér`s V of 0.46 (n = 99 patients) and 0.29 (n = 101 patients). Networks utilizing traintime with removed A-train clusters showed improved performance surprisingly only in patients without a separate intermedius nerve (best chi2 with A-train clusters: 32.7 vs. 35.6 without and 18.3 vs. 17.0 with a separate intermedius nerve, Table 3).

Discussion

We utilized machine learning approaches in a group of 200 patients undergoing surgery for vestibular schwannoma. Our results show that these methods can extract and combine information from preoperative facial nerve function, tumor size and intraoperative A-train quantity to estimate postoperative facial nerve outcome. The achieved performance exceeds results from evaluation of either of the analyzed features alone. Specifically, we observe predictive performance within subgroups of patients with identical Koos tumor sizes. Performance did not improve when observation of a separate intermediate nerve and/or detection of A-train clusters were added to the analysis. Prediction improved however when A-clusters were removed from the detected traintime. Subgroup analysis showed that this effect is mainly due to improved performance in patients without a separate intermedius nerve.

Our previous studies on estimation of postoperative functional outcomes after vestibular schwannoma surgery discovered that an intermediate nerve running separate from the main facial nerve trunk can give rise to an exceeding amount of A-train activity [6, 11]. As the resulting traintime is not correlated to any postoperative palsy in these cases, estimation of outcome is severely limited based on free-running EMG alone. Since observation of a separate intermedius is related to tumor size in a non-linear manner [11] and tumor size yield predictive information itself [4, 17, 18], we hypothesized that leveraging this interaction for analysis could improve outcome estimation. However, we assumed that this relationship would be shaped by the intricate anatomy of the cerebello-pontine angle, the impact of the tumor, as well as individual topographic and surgical characteristics. Consequently, this motivated us to utilize machine learning approaches due to their ability to discover and leverage complex, multivariate and non-linear relationships in the data.

Indeed, integrating preoperative facial nerve function, traintime and tumor size outperformed postoperative HB-grade estimation using only Koos tumor size or traintime. The latter two yielded the lowest performance, although a Cramér’s V of 0.39 and 0.40 would still be considered a large effect size or concordance. Although performance was generally lower in patients with a separate intermedius nerve, integrating information of traintime, tumor size and preoperative function also resulted in improved estimation in this subgroup.

Preoperative facial nerve function and tumor size have been shown to impact intraoperative monitoring modalities and their interpretation. Facial motor evoked potentials (FMEP) for example correlate with tumor size already at the start of surgery [19], while interpretation of traintime should take preoperative deficits into account [3]. Our results show that the neural network approach is able to integrate these different modalities of information, effectively implementing such clinical recommendations in a formalized and objective manner.

Utilizing traintime with manually removed A-train clusters resulted in a considerable improvement even if only the mean traintime was considered. While traintime including A-train clusters yielded one of the lowest performances, this correction increased chi2 from 31.9 to 47.7, corresponding to an improved Cramér`s V from 0.39 to 0.49. The combination with preoperative HB and tumor size then showed the best of all tested combinations. The removal was based on our previous findings, that patients with separate intermedius nerve show A-train clusters significantly more often than patients in whom such a split nerve course was not observed [11]. We argued that this A-train activity in most recording channels within a short time window is an expression of an hyperexcitable or more vulnerable intermedius nerve. The phenomenon is reminiscent of the large quantities of A-train activity often observed in patients with previous surgery or irradiation [5], which do not lead to any significant functional postoperative deterioration in these patients as well.

The result that removing A-train clusters is beneficial for HB estimation seems to support the idea that such excessive, clinically not informative traintime may be caused by a separate intermedius nerve [6, 11]. In this light, it is however surprising that providing the observation of such a separate intermedius or the presence of A-train clusters to neural networks was not helpful and even partially decreased performance. Furthermore, the effect was largely present in the subgroup of patients without separate intermedius nerve, while patients with such a nerve course did not benefit or even showed reduced performance (Table 3).

Consequently, the results indicate that A-train clusters generally over-represent actual damage to the facial nerve – not only when a split nerve course is encountered. The corresponding cluster traintime should therefore be weighted weaker than traintime from singular A-trains or removed entirely. In the current study, against our hypothesis, this correcting step however was not sufficient to ameliorate the impact of a separate intermediate nerve.

There are several potential reasons for this limited impact. First, due to practical reasons, A-train clusters were identified visually. This strategy may have resulted in marking only the clearest occurrences of cluster-like A-trains, while the phenomenon might in fact be subtler and rather manifest as a “spectrum of over-representation”. Furthermore, more detailed characteristics such as topographical distribution, time and distance between occurrences and relationship to singular A-train patterns were not evaluated.

Even if such fine-grained electrophysiological information would not alleviate the intermedius “issue”, utilization of neural network offers further options for potential improvement. The methodology allows for integration of more information sources, above and beyond tumor size, preoperative function and continuous EMG. It is conceivable that the clinical value offered by alternative monitoring techniques, such as FMEP [1921] or direct electrical stimulation [22], could be utilized for a multimodal monitoring technique. In addition, determination of the course of the facial nerve by tractography [23] could add valuable anatomical information.

Overall, however, most estimated HB grades corresponded well to clinical evaluation. In the range of moderate facial nerve palsy, we observed deviations by one, sometimes two degrees (Fig. 3). Such variability may partially be caused by the subjective nature of the HB grading system itself, respectively its practical application [2426]. Scheller et al. [14] investigated the interobserver variability of the HB grading system as part of a randomized multi-center phase III trial. In this study, too, HB grades varied between observers by one or two degrees in an extent comparable to our results. HB grades were also most consistent when postoperative facial nerve function was normal or mildly impaired. The neural network estimates are therefore well within the range of this variability. Further improvement may require the use of a more objective grading system with better interrater reliability [2629].

Conclusions

In conclusion, neural networks using traintime, preoperative facial nerve function and tumor size can estimate postoperative HB grades with good accuracy. However, they do not fully compensate false positive A-train activity associated with a separate intermedius nerve. Removal of A-train cluster traintime nevertheless seems to be advisable in cases without a separate course of the intermediate nerve. Neural networks can integrate information from different pre- and intraoperative diagnostic methods and may enable comprehensive multimodal monitoring.

Declarations

Funding

This work was supported by the Deutsche Forschungsgemeinschaft (PR-1275/1-2 and RA 2062/3-1).

Competing interests

The authors have no relevant financial or non-financial interests to disclose.

Author Contributions

All authors contributed to the study conception and design. Data collection was performed by CS, CSt and JP. Analysis was performed by SR and MH. The first draft of the manuscript was written by SR and MH and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

References

  1. Prell J, Strauss C, Plontke SK, Rampp S. Intraoperative Funktionsüberwachung des N. facialis: Operationen an Vestibularisschwannomen. HNO. 2017;65:404–12.
  2. Stankovic P, Wittlinger J, Georgiew R, Dominas N, Hoch S, Wilhelm T. Continuous intraoperative neuromonitoring (cIONM) in head and neck surgery-a review. HNO. 2020;68:86–92.
  3. Prell J, Rachinger J, Scheller C, Alfieri A, Strauss C, Rampp S. A real-time monitoring system for the facial nerve. Neurosurgery. 2010;66:1064–73. discussion 1073.
  4. Prell J, Strauss C, Rachinger J, Alfieri A, Scheller C, Herfurth K, et al. Facial nerve palsy after vestibular schwannoma surgery: Dynamic risk-stratification based on continuous EMG-monitoring. Clin Neurophysiol Int Federation Clin Neurophysiol. 2014;125:415–21.
  5. Rampp S, Strauss C, Scheller C, Rachinger J, Prell J. A-trains for intraoperative monitoring in patients with recurrent vestibular schwannoma. Acta Neurochir. 2013;155:2273–9.
  6. Prell J, Strauss C, Rachinger J, Scheller C, Alfieri A, Herfurth K, et al. The intermedius nerve as a confounding variable for monitoring of the free-running electromyogram. Clin Neurophysiol. 2015;126:1833–9.
  7. Strauss C, Prell J, Rampp S, Romstöck J. Split facial nerve course in vestibular schwannomas. J Neurosurg. 2006;105:698–705.
  8. Ashram YA, Jackler RK, Pitts LH, Yingling CD. Intraoperative electrophysiologic identification of the nervus intermedius. Otol Neurotol. 2005;26:274–9.
  9. Alfieri A, Fleischhammer J, Peschke E, Strauss C. The nervus intermedius as a variable landmark and critical structure in cerebellopontine angle surgery: an anatomical study and classification. Acta Neurochir. 2012;154:1263–8.
  10. Alfieri A, Rampp S, Strauss C, Fleischhammer J, Rachinger J, Scheller C, et al. The relationship between nervus intermedius anatomy, ultrastructure, electrophysiology, and clinical function. Usefulness in cerebellopontine microsurgery. Acta Neurochir (Wien). 2014;156:403–8.
  11. Rampp S, Illert J, Krempler K, Strauss C, Prell J. A-train clusters and the intermedius nerve in vestibular schwannoma patients. Clin Neurophysiol. 2019;130:722–6.
  12. House JW, Brackmann DE. Facial Nerve Grading System. Otolaryngology-Head and Neck Surgery. 1985;93:146–7.
  13. Koos WT, Day JD, Matula C, Levy DI. Neurotopographic considerations in the microsurgical treatment of small acoustic neurinomas. J Neurosurg. 1998;88:506–12.
  14. Scheller C, Wienke A, Tatagiba M, Gharabaghi A, Ramina KF, Scheller K, et al. Interobserver variability of the House-Brackmann facial nerve grading system for the analysis of a randomized multi-center phase III trial. Acta Neurochir Springer-Verlag Wien. 2017;159:733–8.
  15. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: L. Erlbaum Associates; 1988.
  16. Cumming G, Finch S. Inference by Eye: Confidence Intervals and How to Read Pictures of Data. Am Psychol. 2005;60:170–80.
  17. Falcioni M, Fois P, Taibah A, Sanna M. Facial nerve function after vestibular schwannoma surgery. J Neurosurg. 2011;115:820–6.
  18. Samii M, Matthies C. Management of 1000 vestibular schwannomas (acoustic neuromas): the facial nerve–preservation and restitution of function. Neurosurgery. 1997;40:684–5.
  19. Matthies C, Raslan F, Schweitzer T, Hagen R, Roosen K, Reiners K. Facial motor evoked potentials in cerebellopontine angle surgery: Technique, pitfalls and predictive value. Clin Neurol Neurosurg. 2011;113:872–9.
  20. Dong CC, Macdonald DB, Akagami R, Westerberg B, Alkhani A, Kanaan I, et al. Intraoperative facial motor evoked potential monitoring with transcranial electrical stimulation during skull base surgery. Clin Neurophysiol. 2005;116:588–96.
  21. Greve T, Wang L, Thon N, Schichor C, Tonn JC, Szelényi A. Prognostic value of a bilateral motor threshold criterion for facial corticobulbar MEP monitoring during cerebellopontine angle tumor resection. J Clin Monit Comput Springer Sci Bus Media B V. 2020;34:1331–41.
  22. Quimby AE, Lui J, Chen J. Predictive Ability of Direct Electrical Stimulation on Facial Nerve Function Following Vestibular Schwannoma Surgery: A Systematic Review and Meta-analysis. Otol Neurotol NLM (Medline). 2021;42:493–504.
  23. Savardekar AR, Patra DP, Thakur JD, Narayan V, Mohammed N, Bollam P, et al. Preoperative diffusion tensor imaging-fiber tracking for facial nerve identification in vestibular schwannoma: A systematic review on its evolution and current status with a pooled data analysis of surgical concordance rates. Neurosurgical Focus. American Association of Neurological Surgeons; 2018. p. 44.
  24. Ahrens A, Skarada D, Wallace M, Cheung JY, Neely JG. Rapid simultaneous comparison system for subjective grading scales grading scales for facial paralysis. Am J Otol [Internet]. 1999;20:667–71. Available from: http://www.ncbi.nlm.nih.gov/pubmed/10503592.
  25. Alicandri-Ciufelli M, Piccinini A, Grammatica A, Salafia F, Ciancimino C, Cunsolo E, et al. A step backward: The “Rough” facial nerve grading system. Journal of Cranio-Maxillofacial Surgery. J Craniomaxillofac Surg; 2013;41.
  26. De Ru JA, Braunius WW, Van Benthem PPG, Busschers WB, Hordijk GJ. Grading facial nerve function: Why a new grading system, the MoReSS, should be proposed. Otology and Neurotology. 2006;27:1030–6.
  27. Coulson SE, Croxson GR, Adams RD, O’Dwyer NJ. Reliability of the “Sydney,” “Sunnybrook,” and “House Brackmann” facial grading systems to assess voluntary movement and synkinesis after facial nerve paralysis. Otolaryngology - Head and Neck Surgery. Mosby Inc.; 2005;132:pp. 543–9.
  28. Murty GE, O’donoghue GM, Bradley PJ, Diver JP, Kelly PJ. The Nottingham System: Objective assessment of facial nerve function in the clinic. Otolaryngology–Head and Neck Surgery. Otolaryngol Head Neck Surg. 1994;110:156–61.
  29. Fattah AY, Gurusinghe ADR, Gavilan J, Hadlock TA, Marcus JR, Marres H, et al. Facial nerve grading instruments: Systematic review of the literature and suggestion for uniformity. Plastic and Reconstructive Surgery. Lippincott Williams and Wilkins; 2015. pp. 569–79.