**4.1 Animal anaesthesia & surgical procedure**

This animal study was approved by the Committee on Animal Experimentation of the regional council of Baden-Württemberg in Karlsruhe, Germany (G-161/18 and G-262/19). All animals used in the experimental laboratory were managed according to German laws for animal use and care, and according to the directives of the European Community Council (2010/63/EU) and ARRIVE guidelines 60. Data from 46 pigs was included in the analyses.

Experimental animals were operated under general anaesthesia with extensive monitoring including invasive blood pressure measurement. Midline laparotomy was performed to access the abdominal cavity. Ligaments around the liver and the hepato-gastric ligament were dissected and visceral organs mobilized, including the removal of the coverage of the kidneys while carefully sparing vessels. Scissors, electrocautery and bipolar vessel-sealing devices were used. A suprapubic catheter was inserted into the bladder. After surgery, pigs were euthanized with a lethal dose of i.v. potassium chloride solution.

**4.2 Hyperspectral Imaging**

The hyperspectral datacubes were acquired with the TIVITA® Tissue system (Diaspective Vision GmbH, Pepelow, Germany), which is a push-broom scanning imaging system and the first commercially available hyperspectral camera for medicine. It provides a high spectral resolution in the visible as well as near-infrared (NIR) range from 500 nm to 995 nm in 5 nm steps resulting in 100 spectral bands. Its field of view contains 640 x 480 pixels with a spatial resolution of approximately 0.45 mm/pixel (**Figure 6**). The distance of the camera to the specimen is controlled via a red-and-green light targeting system. Six halogen lamps directly integrated into the camera system provide a uniform illumination. Recording takes around seven seconds.

**4.3 Image acquisition, annotation and processing**

Images were recorded with a distance of 50 ± 5 cm between camera and organs. In order to prevent distortions of the measured reflectance spectra due to stray light, the tissue recordings were made while lights in the operating room were switched off and curtains were closed. While the majority of pig recordings was done in a generic approach in order to accurately represent intraoperative reality, recordings for the mixed model analysis were done with a highly standardized protocol for a subset of 11 pigs (8 to 9 pigs per organ) (between P36 and P46 as indicated in **Supplementary Text 1**). This standardized protocol includes recordings of 3 repetitions of exactly the same surgical scene (“repetition” effect) from 3 different angles (“angle” effect) (perpendicular to the tissue surface, 25° from one side and 25° from the opposite side) for 4 different organ positions / situs / situations (“image” effect) resulting in a total of 36 recordings for each of the 20 organs (8 to 9 pigs per organ) in a total of 11 pigs. Recordings for bile fluid were performed when applied and soaked onto 5 stacked surgical compresses, ensuring that there is no influence from the background. For a more extensive overview of the dataset and a schematic recordings protocol for the standardized subset please refer to **Supplementary Figure 1** and **Supplementary Figure 2**.

All of the 9,059 recorded images were sorted into the 20 respective organ folders and manually annotated resulting in 17,777 organ annotations (as several organs could be contained within one image). A precise annotation protocol can be found in **Supplementary Text 4**. Annotations were done by one medical expert and then verified by two other medical experts. In case of improper annotation, the annotation was redone collectively for that specific recording.

For t-SNE and the machine learning analysis, spectral information was previously L1-normalized at pixel-level for increased uniformity. All of the other analyses (including the structured model from **Supplementary Text 2 **and** Supplementary Figure 3) **required unprocessed reflectance values from the original datacubes. After annotation, the wavelength-specific annotation-wise median was automatically calculated over every pixel included in the annotation. These median annotation-wise spectra (previously either L1-normalized or not) represented the basic data format that all analyses in this paper were based upon. Calculation of the mean (and SD) integral of the organ reflectance curves of individual animals was performed to quantify overall brightness or amount of light that is reflected by the organ in relative units; greater values indicate greater reflectance intensity. Although this quantification of the overall level of the reflectance curve and therefore the area under the curve is influenced by the distance between camera and tissue, the standardization of this distance reduced this influence, rendering this integral a valuable information.

**4.4 t-SNE**

t-distributed Stochastic Neighbor Embedding (t-SNE) 10 is a machine learning method commonly used to reduce the number of dimensions of high-dimensional data and was used to visualize the characteristic reflectance spectra of each pig organ. This non‐linear multi-dimensionality reduction tool has already proven valuable for the analysis of HSI and mass spectrometry data 61 and was chosen for visualization as it has shown particular promise for biological samples in the past 62,63. The algorithm aims at modelling manifolds of high‐dimensional data, and produces low‐dimensional embeddings that are optimized for preserving the local neighbourhood structure of the high-dimensional manifold 10. In comparison to linear methods like PCA 64 and LDA 65, t-SNE preserves more relevant structures of datasets that have non-linear features. For these reasons, t-SNE was used for dimensionality reduction.

Before optimizing the parameters of t-SNE, the dataset was prepared in the following manner: One characteristic reflectance spectrum was obtained for each annotation by calculating the median spectra from the (previously on pixel-level L1-normalized) spectra of all pixels in the annotation. Consequently, each data point represents the reflectance of one organ in one image of one pig. The two-dimensional visualization of the reflectance spectrum of the complete dataset was optimized by performing a random search of the following parameters:

- Parameter 1: The early exaggeration, which controls how tight natural clusters in the original space are in the embedded space and how much space will be between them. 50 random integer values were sampled in the range [5; 100].
- Parameter 2: The learning rate, which is used in the optimization process. 100 random integer values were sampled in the range [10; 1000].
- Parameter 3: The perplexity, which is related to the number of nearest neighbors for each data point to be considered in the optimization. 50 equidistant integer values were sampled uniformly in the range [2; 100].

The early exaggeration was the first parameter optimized by visual inspection of the two-dimensional representation of the dataset. The learning rate was then optimized in the same manner while keeping the early exaggeration constant. Subsequently, the perplexity was optimized by keeping the other two parameters constant. The optimal values for each of the parameters were 34 for the early exaggeration, 92 for the learning rate and 30 for the perplexity.

**4.5 Linear mixed models**

Independent linear mixed models were used for an explained variation analysis in order to evaluate the effect of the influencing factors on changes in the spectrum. The (proportion of) explained variance was obtained using the empirical decomposition of the explained variation in the variance components form of the mixed model 11.

For the first approach, for each wavelength, an independent linear mixed model was fitted with fixed effects for “organ” and “angle” as well as random effects for “pig” and “image”. More precisely, for each wavelength the following model was fitted (suppressing the wavelength index):

for repetition *k*=1,…,3 of image *j*=1,…,ni of pig *i*=1,…, 11 (with ni the number of images of pig *i* ranging from 84 to 228 and ). is an intercept, is a row vector of length 19 indicating the organ of observation *ijk* (with arbitrary reference category “stomach”) and is a vector of corresponding fixed organ effects. Similarly, are fixed effects for angle (“25° from one side” and “25° from the opposite side” for reference category “perpendicular to the tissue surface”). and are random pig and image effects, respectively, assumed to be independently normal distributed with between pig variation and between image variation Residuals capture the variability between repeated recordings of the same image.

The proportion of variability in reflectance explained by each factor was derived as in 11. “Repetition” depicts the residual variability, which is here the within image variability (i.e. across replications). 95 % pointwise confidence intervals based on parametric bootstrapping with 500 replications indicate the uncertainty in estimates.

For the second approach with stratification by organ, independent linear mixed models were fitted for each organ and wavelength with fixed effects for “angle” as well as a random effect for “pig” and “image”, i.e for each organ and wavelength the same model as given above was fitted excluding covariate “organ”. The explained variation of each factor was depicted 11. “Repetition” depicts the residual variation, which is here the within image variability (i.e. across replications). 95 % pointwise confidence intervals based on parametric bootstrapping with 500 replications indicate the uncertainty in estimates. Curves were linearly interpolated if model fit was singular. All linear mixed model analyses were based on image-wise organ-specific median reflectance spectra that were obtained by calculating the median spectrum of all pixel spectra within one annotation.

**4.6 Machine Learning**

Prior to training the deep learning network, we systematically split the dataset comprising 46 pigs (9,059 images with 17,777 annotations) into a training dataset consisting of 38 pigs (3,766 images with 7,882 annotations) and a disjoint test set consisting of 8 pigs (5,293 images with 9,895 annotations) as indicated in **Supplementary Figure 1**. These 8 test pigs were randomly selected from the 11 standardized pigs (P36-P46) with the only criterion that every organ class is represented by at least one standardized pig in the test as well as in the training dataset. This criterion could not be fulfilled anymore when selecting more than 8 standardized pigs.

The hold-out test set was used only after the network architecture and all hyperparameters had been fixed. Leave-one-pig-out cross-validation was performed on the training dataset and the predictions on the left-out pig were aggregated for all 38 folds (46 minus 8) to yield the validation accuracy. The hyperparameters of the neural network were optimized in an extensive grid search such that the validation accuracy was maximized. Once the optimal hyperparameters were determined, we evaluated the classification performance on the hold-out test set by ensembling the predictions from all 38 networks (one for each fold) via computing the mean logits vector (the input values to the softmax function, see below) followed by the argmax operation to retrieve the final label for each annotation.

The deep learning-based classification was performed on the median spectra computed from the L1-normalized spectra of all pixels in the annotation masks resulting in 100-dimensional input feature vectors.

The deep learning architecture was composed of 3 convolutional layers (64 filters in the first, 32 in the second, and 16 in the third layer) followed by 2 fully connected layers (100 neurons in the first and 50 in the second layer). The activations of all five layers were batch normalized and a final linear layer was used to calculate the class logits. Each of the convolutional layers convolved the spectral domain with a kernel size of 5 and was followed by an average pooling layer with a kernel size of 2. The two fully connected layers zeroed out their activations with a dropout probability of . All non-linear layers used the Exponential Linear Unit (ELU) 66 as activation function.

We chose this architecture as it provides a simple yet effective way to analyze the spectral information. The convolution operation acts on the local structure of the spectra and we used a relatively small kernel size and stacked 3 layers to increase the receptive field while being computationally efficient 67. The two fully connected layers make a final decision based on the global context. The advantage of this approach is that it combines local and global information aggregation while still being computationally efficient since the entire network only uses 34,300 trainable weights.

The softmax function was used to provide the *a posteriori* probability for each class. We used the Adam optimizer (β1= 0.9, β2= 0.999) 68 with an exponential learning rate decay (decay rate of and initial learning rate of ) and the multiclass cross-entropy loss function. In order to meet class imbalances, we included an optional weight of the loss function according to the number of training images per class and sampled instances for the batches either randomly or oversampled such that each organ class had the same probability of being sampled. Both design choices were investigated in the hyperparameter grid search.

We trained 10,000,000 samples per epoch for 10 epochs with a batch size of . In an extensive grid search, we determined the best-performing hyperparameters: dropout probability , learning rate , decay rate , batch size , a weighted loss function and no oversampling.