Speech Perception Across The Lifespan by Means of Articial Intelligence

Hearing loss is one of the most common diseases worldwide. It affects communicative abilities in all age groups. However, it is well known that elderly people suffer more frequently from hearing loss. The aim of this study was to describe the relationship between hearing loss, age, and speech perception. The model that we employed used Random Forest Regression. It was applied to a large clinical data set of 19,801 ears, covering all degrees of hearing loss. It allows the estimation of age-related decline in speech perception in quiet, with the effect of pure-tone hearing loss completely separated. Our results show that speech scores depend on the specic type of hearing loss and age. We found age effects for all degrees of hearing loss. A deterioration in speech perception of up to 25 percentage points across the whole life span was observed. The largest decrease was 10 percentage points per life decade. The decline can be attributed to a distortion component of presbyacusis, which is not measured by pure-tone audiometry.


Introduction
More than 5% of the world's population, approximately 360 million people, suffer from disabling hearing loss. Hearing disability is associated with reduced speech perception and, in consequence, reduced communication function. Hearing deteriorates with age 1 . In particular, auditory sensitivity and speech perception decline after age 60. Given our ageing society and the prevalence of age-related hearing loss, it is clear that hearing loss is a common public-health issue affecting almost all older adults. This agerelated hearing loss (ARHL) has no adequate causal therapy, and technical solutions such as hearing aids, cochlear implants and assistive devices improve hearing thresholds but do not restore speech recognition su ciently.
Speech perception de cits in persons with hearing impairment are mainly attributable to decreased audibility of the speech signal over part or all of the speech frequency range. A second contribution stems from impaired processing of the audible speech signal, which results in reduced clarity 2 . Similarly, Plomp 3 subdivided the components of speech recognition de cits in sensorineural hearing loss into Class A (attenuation) and Class D (distortion). The Class A component relates to Carhart's acuity de cit 2 and can be compensated for by frequency-dependent gain aiming at improving audibility. The Class D component relates to Carhart's clarity de cit 2 and represents the distortion that is produced by damage to the auditory periphery and the mechanisms of neural processing. Evidently, this distortion cannot be compensated for by acoustic ampli cation. Plomp 3 concluded that the Class D component may limit the bene t of hearing aids.
Both attenuation and distortion are part of ARHL. A large number of studies have focussed upon hearing in the elderly and have investigated pure-tone loss, speech perception, and other suprathreshold abilities such as spatial processing and recognition of distorted speech. In many studies 1, 4-17 speech recognition in older adults has been investigated. The interpretation of these results remains challenging, as pure-tone thresholds change substantially with increasing age and pure-tone loss is the primary predictor for speech recognition. Hence, it is necessary to correct for the effect of pure-tone loss when investigating the effect of age on speech perception. One of the rst attempts to do this was described by Jerger 4 in a report on speech recognition in a large group of older subjects. He analysed scores from the clinical records of 2162 patients. With subjects grouped according to age and average hearing loss at 0.5, 1 and 2 kHz, results suggested that speech recognition, de ned as the maximum score obtained using a monosyllabic word list (WRS max ), declines above the age of sixty. In particular, he found that age had an effect on speech recognition for individuals with mild hearing loss of approximately 4% per life decade, but it had a greater effect upon those with higher degrees of hearing loss, e.g. 10% per decade.
Unfortunately, he did not report on hearing loss at higher frequencies. There is a long-standing consensus that hearing thresholds at these higher frequencies are, in particular, worse for older subjects 1, 5 .
Consequently, there is a systematic overestimation of the age effect 12 .
Several other studies have revealed evidence that deterioration in speech understanding occurs in addition to deterioration in hearing sensitivity and includes components beyond elevated hearing thresholds [6][7][8][9][10][11][12] . However, the authors 11, 12 highlighted the challenge of separating varying auditory thresholds from age, a factor affecting all sensory modalities 13 . In recent studies [14][15][16] , speech perception and its relation to age were investigated either by correcting for pure-tone loss 15,16 or by using a longitudinal study design 14 . In a clinical population Hoppe et al. 15 investigated speech perception with hearing aids and WRS max for different age groups in relation to average pure-tone loss at 0.5, 1, 2 and 4 kHz (4FPTA). They found a monotonic decrease for speech perception with increasing age and a signi cant drop of about 2-4% per decade. Müller et al. 16 investigated the WRS max as a function of age.
After correcting for pure-tone loss they found a signi cant, though smaller, drop for people aged above 70 years of about 2-3% per decade. Neither study included hearing threshold beyond 4 kHz, and therefore a small overestimation of the in uence of age cannot be excluded. However, Dubno et al. 14 found a larger effect, around 7-8% per life decade. They performed a longitudinal study including 256 subjects with age-related hearing loss, aged 50-82 years, over a period of 3-15 years. The speech recognition scores were corrected for by changing hearing thresholds during the observation phase; this was done by using the individuals' articulation index as an importance-weighted metric for speech audibility. Unfortunately, longitudinal studies suffer from other disadvantages relating to population size, loss of follow-up etc., and their duration can approach the limits of the clinician's working life-span. The special characteristics of the study population and methods -neither the WRS max nor hearing-aid scores were measureddiffer from the studies mentioned above. This impedes a direct comparison with the above-mentioned studies and therefore does not imply a contradiction amongst them [14][15][16] . In summary, there is evidence that speech recognition in older subjects declines with increasing age more rapidly than their pure-tone thresholds would suggest. The effect seems to be small and depends on age and degree of hearing loss.
The goal of this study was the development of a tool to describe the relation between hearing loss, age, and speech perception. It uses a machine-learning algorithm, Random Forest Regression (RFR) 18 . This algorithm was applied to a large data set from routine clinical audiometry in order to investigate the in uence of age. The result is a representation of the relation between pure-tone thresholds and age, with clear distinction of both factors on the input side and speech perception on the target side. The tool allows description of the in uence of age, with hearing thresholds kept xed. RFR is an algorithm that uses an ensemble method of decision-tree-based regressions to determine a response from a set of predictor variables. It does not rely on any particular assumptions regarding data distribution. Hence, it is ideally suited for modelling a large clinical audiometric database.

Results
The stacked bar plot ( Fig. 1) shows the case distribution in our clinical population (N = 19,801). The mean ages of the different groups were 50, 61, 66, 65 and 59 years for WHO 0 , WHO 1 , WHO 2 , WHO 3 and WHO 4 .
The vast majority (77%) of cases involved persons between 40 and 80 years of age. The subjects aged 40 to 80 years dominated all WHO grades except WHO 0 . The smallest data coverage with respect to age and hearing loss was observed for very young adults in the WHO 4 group and for subjects above 80 years of age in the WHO 0 group.
The speech audiometric results for the model's target scores, WRS 65 and WRS max , are shown in Fig. 2A and 2B, respectively. For both measures the median decreased monotonically with increasing degrees of hearing loss. The variability for WRS 65 was largest for WHO 1 , while for WRS max the largest variability was found for WHO 3 . In this rather rough classi cation the interpretation of some outliers may bene t from additional information about the actual cause of hearing loss. In particular, the WHO classi cation employs the hearing thresholds at only four frequencies, while other frequencies are not considered. The lowest quartile of the WHO 0 cases shows a WRS 65 lower than 95%. In this subgroup the mean threshold for high frequencies (> 4 kHz) was 48 dB, while the overall mean threshold for high frequencies was 30 dB in the WHO 0 group. Among the 1075 WHO 3 cases, 41 exhibited a WRS 65 larger than 0%. Those cases had a mean threshold for low frequencies (< 0.5 kHz) of 32 dB, while the overall mean threshold for high frequencies was 54 dB.

Model setup
The RFR was trained with 80% of the data (training group). The model was then tested in the remaining 20% of the study population (evaluation group). Before assignment to groups was performed, the data sets were sorted according to 4FPTA and age. Subsequently, every fth data set was assigned to the evaluation group. The pure-tone thresholds at all frequencies and the patients' age were input (predicting) variables, while the WRS 65 , WRS max and L max were targets. For each of the three output variables a separate RFR-model was built. The most important parameter with respect to over tting is the number of decision trees within a random forest. As a parameter for estimating the RFR performance the median absolute error (MAE, resulting from measured minus predicted score) was used for both the training group and the evaluation group. The MAE in the evaluation group was larger for lower numbers (< 10) of decision trees, while remaining at a certain level for a large range (50-1000) of numbers of trees. Therefore, the Matlab default of 100 was retained. Other parameters for setup of a single decision tree were found to be of less importance for the RFR performance and were therefore kept at default values.
The number of nodes per binary decision tree varied for each model: around 2150 for WRS max , around 2700 for WRS 65 , and around 3650 for L max . Table 1 summarises the performance of the model as assessed by MAE for both the training and the evaluation group. Owing to the composition of our study population the WHO 0 is by far the largest group.
The MAE of this group would have dominated the overall summary. For this reason, Table 1 summarises the error estimation for each grade of hearing loss separately. Evidently, there is a great variation of the MAE among the WHO groups and between the target variables. The largest prediction errors were observed in WHO 2 for the WRS 65 group and in WHO 3 and WHO 4 for WRS max . For those WHO groups the MAE of the training and evaluation groups differed by a factor of 1.5-1.7. Table 1 Median absolute error of the machine-learning prediction model for the monosyllabic score at a presentation level of 65 dB SPL (WRS 65 ), the maximum word recognition score (WRS max ) and the presentation level for the maximum word recognition score, L max . Application of the prediction model One possible application of the prediction model is shown in Fig. 3. For the Bisgaard audiogram types 19 the model input was a pure-tone audiogram according to the at (N-type) or steep (S-type) audiograms.
The pure-tone thresholds were kept constant and the subjects' age varied between 18 and 99 years. Owing to the relation between age and hearing thresholds hardly any subjects were in our population aged > 85 for groups N 1 and S 1 . Consequently, the model could not be trained in this range, so it was excluded from calculations. Linear regression models for all curves were tted and then tested against a constant model including a Bonferroni-Holm correction. The determined slopes of the linear regression analysis are shown in Fig. 3D, E, and F, for WRS 65 , WRS max , and L max , respectively. Figure 3A shows that the model results in WRS 65 , which decreases, mostly monotonically, with age. However, the model suggests that age effects depend on speci c age range, and also on the degree and kind of hearing loss as indicated by the Bisgaard types 19 . For types N 3 , S 1 , and S 3 the model output indicates a decrease of WRS 65 with age. The model results become more complex if the WRS max and L max are considered as shown in Fig. 3B and C, respectively. The presentation level shows, for all types except N 6 , an increased presentation level for WRS max with increasing age. A considerable decrease in score can be observed in N 6 , accompanied by a slight but signi cant decrease of L max . For the N 4 and S 3 types the model gives a signi cant decrease in WRS max which is somehow weakened by an increased presentation L max for this type. For all other types the WRS max did not change signi cantly with age.
However, for these types the model results in an increased presentation level.

Discussion
The machine-learning approach allows description of the age-related decline of speech perception. In comparison with previous studies, more detailed information about the time course and amount of degradation was achieved. The RFR model allows the quantitative description of the two basic impacts of hearing loss and its relation to age: On the one hand the impact of the attenuation component, and on the other the impact of the age-related increase of the distortion components. The RFR model is able to take into account frequency-speci c differences and to separate pure-tone threshold completely from age. It therefore offers the opportunity to overcome an immanent bias in previous investigations 4,8,11,15,16 by isolating age-related threshold shifts from age-related decline in speech perception as such.
This study should not be misunderstood as an attempt to predict speech perception scores on the basis of pure-tone loss. These scores have in any case to be measured individually. Already the large variability of individual scores necessitates speech audiometry. The purpose of the model in this study is the prediction for larger patient populations with respect to speci c audiogram types and age. It can be seen in Fig. 3 that those age-related changes are present for the entire duration of adulthood. However, apart from the fact that higher age relates to lower speech perception scores, no common quantitative trend, for all age groups and pure-tone hearing losses, can be discerned. This may be regarded as the major outcome of the model calculations. The measurable age-related decline in speech perception depends on the age range considered, the speci c audiogram, and the speci c application of speech audiometry.
Owing to saturation effects of the WRS 65 measured at typical conversation level, we observe the largest age effect for moderate hearing losses (N 3 -type audiograms). For the WRS max measured at substantially higher levels, the largest effects were observed for audiogram types corresponding to severe hearing losses (N 4 , N 5 , N 6 ). However, a potential decrease of the WRS max can be substantially counterbalanced by an increased presentation level for some audiogram types. This nding is not consistent across all hearing losses. For the N 6 group, with the largest age-related decline, we also see a decreased tolerance for higher presentation levels. This might re ect certain underlying pathomechanisms which are more likely to result in this audiogram type compared with others. Complementary to the attenuation and distortion component of hearing loss 2, 3 a causal and more differentiated breakdown with respect to presbyacusis was proposed early on 20,21 . Initially, ve main types were proposed, namely sensory, neural, metabolic, mechanical, and vascular presbyacusis 20 . This was complemented by the term central presbyacusis in order to reserve the term neural for degeneration of the cochlear nerve 21 . Sensory presbyacusis is congruent with the attenuation component, and is, as pointed out above, completely separated from the relationships in Fig. 3. The effects of all the other types of presbyacusis are not. Moreover, the speci c and different root causes may potentially explain why, for some degrees of hearing loss, different changes in speech perception occur in different life decades. However, possible interactions between the main types of presbyacusis are still not completely understood 22 .
It is not possible to con rm all these explanatory hypotheses by retrospective data analysis, a fact that clearly underlines the limits of our study design. We found differences in age effects in comparison with some of the studies referred to above. This is partly due to the neglect of hearing loss at higher frequencies for the elderly in those studies. On the other hand, for some hearing losses and audiogram types, this study may underestimate age effects, as ceiling effects of speech tests in quiet are included.
Another unique feature of this study is the inclusion of a considerable number of subjects with mild hearing loss, as seen in group S 1 . Even in that group, age effects play a part. Especially the WRS 65 illustrates how everyday communicative ability in quiet might be already affected by mild to moderate hearing loss in a population in which the use of hearing aids does not reach the penetration level needed 23 .
Other possible applications of the model are related to acoustic ampli cation with hearing aids: As shown in Fig. 3, in all groups but N 6 the level for best speech perception L max increases with age at about 0.5 dB per decade. This may indicate that older people may bene t from higher levels for speech perception, i.e. greater ampli cation, when provided with a hearing aid. As far as we know, current ampli cation strategies do not take account of age.
The age dependence of the WRS max found in our study may be used to improve studies evaluating the outcome of hearing-aid use: The WRS max or an equivalent measure is often used as reference for the measurement of successful hearing-aid provision or other acoustic ampli cation 15, 23-26 , for investigation of age-related changes in cognition 17 , and for speech-perception-related studies in general 27 . A consideration of both age and speci c audiogram type could potentially decrease the variability of results. Other studies [28][29][30][31] have focussed on the WRS max as one criterion within the process of evaluation of candidates for cochlear implant or for predicting the outcome of such an implantation 32,33 . A model may provide more insights into hearing-loss-related pathologies amongst hearing aid users, especially in the cases in which the scores with a hearing aid are far below the WRS max or if the WRS max is unexpectedly low with respect to clinical experience or model prediction. Furthermore, epidemiological studies 34-38 on hearing loss can be complemented, providing indirect functional assessment.

Limitations of the study
An important limitation of this study is the restriction to a speci c language and test. With respect to other languages and speech material the comparison of recent studies 32,39 suggests that the test we used is comparable to the English Consonant-Vowel-Nucleus-Consonant test.
Secondly, the outdated but established calibration 40  for a comparison e.g. with CNC results.
The disadvantage of binary decision trees is the high chance of over tting. The use of a random-forest method does decrease this risk. However, a factor of up to 1.7 between the MAEs in the evaluation group in relation to the training group still indicates some degree of over tting. Even the considerable size of the study population and the clustering of input variables do not entirely prevent this risk. Additionally, there are some intrinsic sources of unexplained variability. Even though we excluded unreasonable cases, the population may still have included mild cases of aggravation, simulation or dissimulation. There was also a small number of cases with retrocochlear lesions. This number can be estimated as less than 0.5% in our population by comparison with our patient les and the reported incidence 41 .
An RFR model inevitably re ects the characteristics of the clinical population that contributed to the training. The group characteristics might be different from their peers outside a clinic. Finally, the model re ects statistical characteristics of a population, and not causal relationships.

Conclusion
A random-forest regression model allowed the estimation of age-related decline of speech perception in quiet, completely separated from the effect of pure-tone hearing loss. Small but considerable declines were found across the whole duration of adulthood and for all audiogram types. The decline can be attributed to a distortion component of presbyacusis which is not measurable by pure-tone audiometry. The careful derivation of working hypotheses from our data has the potential to provide more insights into the relationships between pure-tone hearing loss, speci c audiogram types and age.

Methods
Audiometric data were retrieved from a clinical data base at the Audiological Department of Erlangen University Hospital. From the routine audiometric measurements pure-tone-thresholds for both bone and air conduction and speech perception for monosyllabic words (Freiburg Test) were extracted. All measurements were conducted in clinical routine in sound-shielded booths with clinical class A audiometers (AT900 / AT1000 AURITEC Medizindiagnostische Systeme GmbH, Hamburg, Germany). Approval for this study was received from the Institutional Review Board of the University of Erlangen.
All methods were carried out in accordance with relevant guidelines and regulations. In particular, informed consent was obtained from all patients.
Among 91,991 patients who underwent audiometry at our centre from April 2002 to June 2020 we identi ed 53,782 adults aged at least 18 years at the time of rst investigation. Initially, the data were screened for repeated measurements. Only the rst audiometric assessment was retained. Subsequently, the data from 107,564 ears (hereinafter "cases") were checked for a complete set of air and bone conduction thresholds. After removal of incomplete data sets there remained 107,010 cases. In the next step, cases with missing or incomplete speech audiometry data were deleted, whereafter 26,324 cases remained. The data were then screened for cases of mixed hearing loss; the latter was de ned as a difference between air and bone conduction thresholds greater than 10 dB for frequencies within the range 0.5-3 kHz. After removal of mixed-hearing-loss cases, the remaining 19,929 cases were checked for inconsistent results (< 1%), caused e.g. by simulation or lack of collaboration by the patient. If, within the discrimination function for monosyllabic words, a score larger than zero was observed while the presentation level was below the hearing threshold, the data set for that case was not used. For some cases it was observed that the measurement of the discrimination function had not been fully completed, so that a score of 100% was not reached, with the presentation level well (> 15 dB) below the discomfort level. The 19,801 cases nally remaining were used for model-building and for error analysis.
WRS 65 describes speech recognition at a typical conversational level. While WRS 65 is primarily dependent on the attenuation and re ects the loss of speech perception ability in everyday life, WRS max describes the maximum information that can be processed to the auditory system. The difference WRS max -WRS 65 can be used to estimate the acceptance of acoustic ampli cation 23 .
In order to summarise audiometric constellation of our study population and to describe the error of the model, we used the WHO classi cation. The average of hearing thresholds, measured at 0.5, 1, 2 and 4 kHz, refers to the 4FPTA. This average was used to classify according to the WHO categories: WHO 1 (26 dB < 4FPTA ≤ 40 dB), WHO 2 (40 dB < 4FPTA ≤ 60 dB), WHO 3 (60 dB < 4FPTA ≤ 80 dB) or WHO 4 (80 dB < 4FPTA).
As an example of application of the model we used a reference audiogram data set described by Bisgaard et al. 19 . Those authors identi ed ten standard audiograms with different slopes for a large clinical set. While seven of them have a moderate sloping audiogram (N 1 -N 7 ) three have a steeper slope (S 1 -S 3 ) with a threshold difference of at least 60 dB.

Data analysis
For data analysis, model calculation, statistics and gures, the software Matlab R2019B (The Mathworks Inc. Natick, Massachusetts) was used. Data were rounded before the model calculation: hearing thresholds were rounded to 5 dB 42 , and for linear regression the patients' ages were classi ed into decades corresponding to the spacing of the RFR model input.
RFR was used to relate speech scores to hearing thresholds and age. To test the model results for signi cant age dependence, linear regression models were tted to all curves. They were then tested against a constant model including a Bonferroni-Holm correction. Declarations