2.1 Ethical approval
The study protocol was approved by the local ethical committee of Bern (Kantonale Ethikkommission Bern, KEK-BE 2016-00887) and the local ethical committee of the Paul Scherrer Institute (Ethikkommission Nordwest-und Zentralschweiz, 2017-00805), as well as the Mass General Brigham Institutional Review Board (#2022P001306).
2.2 Dynamic synchrotron-based X-ray microtomography
Three fresh-frozen human TBs anonymous donors were provided by the Eaton Peabody Laboratories, Mass Eye and Ear, Boston, MA, USA. The donor of TB1 was a 51-year-old white male (right ear), the donor of TB2 was an 89-year-old black female (right ear), and the donor of TB3 was a 58-year-old white male (left ear). They are stored as B-Fresh1, B-Fresh2, and B-Fresh3 in the PSI petabyte archive system (a tape-based long-term storage system at the Swiss National Supercomputing Centre CSCS in Lugano, Switzerland). For simplicity, we have changed the naming of the samples in this article. TB1 corresponds to the raw data of B-Fresh1, TB2 corresponds to the raw data of B-Fresh2, and TB3 corresponds to the raw data of B-Fresh3. For better readability, we will refer to dynamic synchrotron-based-phase contrast X-ray microtomography simply as dynamic microtomography henceforth.
2.2.1 Sample preparation
The three fresh-frozen TBs were dissected as follows: Laterally, the concha was removed, conserving the bony and cartilaginous external auditory canal. Posteriorly and superiorly, the air cells of the mastoid portion were removed entirely until the tegmen tympani and the antrum. Inferiorly, the soft tissue was removed until the internal carotid artery, the jugular bulb, and the insertion of the Eustachian tube. Medially, the petrous part of the temporal bone was removed until the bony capsule of the labyrinth. The semicircular canals and the internal auditory canal were skeletonized. Finally, the sample included an intact external auditory canal, middle and inner ear, and had a size of approximately 5 x 2 cm. The surrounding temporal bone was reduced to a maximum thickness of 1 mm to minimize the X-ray absorption.
An earplug was sewn into the external auditory canal. During the image acquisition, the specimen was placed in a custom-made cylindrical holder (diameter of 25 mm) and mounted on the rotation stage at the TOMCAT beamline. To prevent the samples from drying out during the acquisition, they were wrapped in neuro-patties soaked in a sterile saline solution, and the top of the holder was sealed with a plastic film (see Figure 1).
2.2.2 Sound stimulation and calibration
Before scanning each sample, we calibrated the sound stimulation with a clinical probe microphone (ER7C, Etymotic Research) by measuring the exact voltage we needed to apply to the auditory canal to reach the desired dB SPL at a particular frequency. We measured at 256 Hz and 512 Hz with 110 dB and 120 dB SPL. We used a sub-woofer with an inverted cone attached for 256 Hz and ER3C Insert Earphones from Etymotic coupled to an amplifier for 512 Hz and connected either of them to a sine wave generator (MeasComp USB daq Module MC1608 USB-1608G SKU: 6069-410- 059). A silicon tube connected the sound stimulation unit to the earplug we sewed to the external auditory canal.
2.2.3 Image acquisition and reconstruction
Dynamic synchrotron-based X-ray phase-contrast microtomography was conducted at the TOMCAT beamline (X02DA) within the Swiss Light Source (Paul Scherrer Institute, Switzerland). A multi-scale strategy was implemented to accommodate the dimensions of human TBs. Initially, a low-resolution (LR) setup was employed to capture overview scans of the sample, followed by local high-resolution (HR) scans of the middle ear. An in-house developed Fiji plugin, utilizing the 3D reconstructed LR dataset as input, facilitated the determination of spatial coordinates for regions of interest to be imaged with the HR setup [28]. The LR overview scans covered a field-of-view (FOV) of approximately 29 x 12.5 mm2, using a half-acquisition technique, which entails a 360◦ rotation rather than the standard 180◦ in tomography acquisitions. The setup comprised a PCO 5.5 Edge camera coupled with a 1:1 microscope positioned 3 meters from the sample, resulting in an effective pixel size of 5.8 µm. Scan parameters were adjusted to minimize radiation exposure, with a 30 ms exposure time and 1000 projections spanning 360◦.
The dynamic HR acquisitions were performed with a custom-made in-house fast read-out system consisting of the GigaFRoST camera [26], a LuAg:Ce scintillator with a thickness of 150 µm, and a 4x magnification high numerical aperture macroscope from Optique Peter [27]. These components were configured at a propagation distance of 250 mm, yielding an effective pixel size of 2.75 µm [27]. The FOV achieved was approximately 11 x 3.3 mm² using the ”half-acquisition” method. LR and HR acquisitions used a polychromatic beam filtered with a 5 mm Sigradur and a 4 mm glass filter. Additional filtration for LR acquisitions included a 15 mm Sigradur and a 75 µm Molybdean filter to minimize sample dose. The resulting average energy was approximately 24 keV.
Given the assumption of periodic vibration in the middle ear, with a frequency matching that of the sound stimulation, each motion cycle occurs much more rapidly than the time required for a complete set of angular projections in tomography acquisition (which typically entails several thousand images). To accommodate this, dynamic tomograms were constructed by gathering a substantial number of projections across multiple consecutive motion cycles while the rotation stage slowly rotated. A total of 40,000 projections were captured during a single 360° rotation for each scan, which took 20 seconds at 256 Hz, and 28 seconds at 512 Hz.
While keeping the maximum FOV, the maximum frame rate of the GigaFRoST camera is at 2 kHz before saturating the data transfer. This corresponded to a minimum exposure period of 0.5 ms between consecutive image acquisitions. Consequently, the exposure period was always maintained above 0.5 ms to prevent saturation of the read-out system while maintaining a consistent FOV across all frequencies. The exposure time, i.e., the effective photon collection duration, was adjusted based on the frequency of sound stimulation. It always stayed within one-tenth of the sound stimulation period to prevent image blurring due to motion. Thus, exposure time decreased with increasing sound stimulation frequency, ranging from 0.3 ms for 256 Hz to 0.19 ms for 512 Hz.
To correct for the X-ray beam inhomogeneities and dark current of the camera, the projections were first dark- and flat-field corrected. The sinograms were then computed for each set of projections and then reconstructed using the filtered back-projection Gridrec algorithm [29] and the Sarepy algorithm for ring removal [30].
Two signals were collected during the image acquisition: the sinusoidal signal (or gating signal) transmitted from the signal generator to the sound unit and the camera exposure signal, giving the exact time of each image acquisition. These two signals allowed us to associate each image with a specific phase of the sine stimulation, corresponding to a specific phase of the vibration of the middle ear. The gating signal period was decomposed into ten different time windows called phases pj, with p0 being the reference phase taken at the ascending zero-crossing point of the sinusoidal curve. A post-gating algorithm was applied to the 40,000 raw projections to sort them into the correct phases and build ten post-gated tomograms of approximately 4000 projections. These 4000 projections were evenly distributed over the full 360◦ rotation of the sample, so that each post-gated tomogram provided a 3D reconstruction of each specific phase of the middle ear motion cycle.
2.2.4 Data analysis
A detailed description of the analysis pipeline is given by Schmeltz & Ivanovic et al.[25]. Note that the pipeline developed to analyze the dynamic synchrotron-based X-ray microtomography data extracts the motion in all three directions. To allow for a more accurate comparison of the two techniques, we adapted the dynamic microtomography pipeline to also extract displacements in only one direction. We tried to match the direction in which the motion was extracted as closely as possible to the direction in which the LDV measurements were taken. For the umbo, this is along the direction perpendicular to the plane of the tympanic annulus. For the stapes, it is perpendicular to the stapes footplate. This allows for a more accurate comparison between the two measurement techniques.
To assess the movement of the ossicular chain in response to sound stimulation, we assumed that the ossicles act as independent rigid bodies. According to this presumption, their three-dimensional motion over time can be characterized by rigid transformations composed of a rotation followed by a translation, i.e., all points within a given ossicle undergo identical transformations. Therefore, analyzing only a subset volume (SV) of an ossicle is sufficient to deduce the transformation of the entire ossicle. As previously mentioned, each motion cycle was divided into ten distinct time intervals anointed phases pj. The intensity-based registration algorithm imregtform from Matlab was employed to perform a 3D registration of the SV of an ossicle imaged at phase p0 with the corresponding SV imaged at the other phases pj, where j ∈ [1,9] to estimate the geometric transformation aligning the two phases without the need of segmentation or manual placement of landmarks. It effectively uses all available information by considering the unaltered intensity of every image pixel, thereby enabling sub-voxel registration [31]. Three SVs were manually selected for each ossicle. The transformations of all SVs were then averaged to obtain an average transformation per phase for each ossicle.
After obtaining the mean transformations for all phases pj across the three ossicles, the sinusoidal displacement of any region of interest (ROI) within an ossicle could be determined by calculating the projection of the displacement vector in z-direction applied to that point. To compare to the LDV measurement points, two ROIs (the umbo and the posterior crus of the stapes) were manually chosen using Fiji from the reconstructed data stack captured at phase p0. To assess the precision of displacement computation compared to manual ROI selection, five points were selected around the ROI to compute the standard deviation of displacement estimations.
We applied the pipeline to a portion of the petrous bone to ensure that the extracted transformations corresponded to vibrations of the stimulated ossicles and not to vibrations of the entire sample in the sample holder. These values set the noise limit for our analyses.
2.3 Laser-Doppler Vibrometer
2.3.1 Sample preparation
The identical three specimens that underwent dynamic microtomography were refrozen and returned to Eaton Peabody Laboratories, Mass Eye and Ear, Boston, MA, USA. Further preparation of the temporal bones consisted of opening the facial recess to confirm the normality of the middle ear structures and to gain access to the stapes for laser-Doppler vibrometry (LDV) measurements [32]. A small part of the anterior-superior wall was opened and later replaced by a transparent plastic window to allow LDV measurement of umbo displacement (see Figure 3).
2.3.2 Laser measurements
Small retro-reflective tape pieces (approximately 100 µm × 100 µm × 60 µm thick) were affixed to the lateral surface of the tympanic membrane (TM) at the umbo and to the posterior crus of the stapes [8]. Each sample was securely positioned on an air-isolation table within a soundproof booth. LDV measurements were initially conducted at the umbo and then transitioned to the stapes without altering the setup. Additionally, vibrations of the petrous bone near the oval window were recorded to assess the noise floor and stimulus artifact in the measured displacements; all of the umbo and stapes motion measurements we report are at least 20 dB above the driven vibration of the petrous bone. For sound stimulation, a speaker (Radio Shack) equipped with a plastic tube was tightly sealed to the opening of the ear canal to deliver sound to the external ear. At the same time, a calibrated probe microphone (PCB 377C10) monitored sound pressures in the ear canal (Pec) within a distance of less than 2 mm from the TM surface. The hardware of the stimulus and recording system, as well as its software control, have been detailed previously by Ravicz and Rosowski, 2012 [33]. The primary stimulus was a sequence of 50 pure tones with frequencies logarithmically spaced between 200 and 20,000 Hz, during which the stimulus voltage to the loudspeaker remained constant at 0.5 V.
2.3.3 Data analysis
Fourier transforms of the recorded microphone and LDV time waveforms described the complex (magnitude and phase angle) sinusoidal sound pressure and velocity at the stimulus frequency. The velocities were converted into displacements by dividing the complex velocity by (2 π × frequency × i) and then normalized by the complex sound pressure at the stimulus frequency (Pec). We report the normalized displacement magnitude by the sound pressure for a single stimulus of 0.5 V. In addition, we cosine corrected the displacement for the angle of the measuring beam to account for the expected piston-like stapes movement of the footplate.