Reconstructing Herbivore Diets: A Multivariate Statistical Approach To Interpreting Compound-Specic Isotope Values

Stable nitrogen (N) isotope analysis of bulk tissues is a technique for reconstructing the diets of organisms. However, bulk nitrogen isotope (δ 15 N) values can be inuenced by a variety of metabolic and environmental factors that can confound accurate dietary reconstruction. Compound-specic isotope analyses of amino acids (CSIA-AA) have demonstrated the power of the approach in understanding how the δ 15 N values of bulk collagen are assembled from the constituent AAs. Furthermore, by connecting these AA δ 15 N values within a robust biochemical framework interpretation of diet and environment are greatly enhanced. Several new proxies have emerged, built around selected AAs; however, the interconnectedness of AA biosynthetic pathways means that patterning of δ 15 N values across a wider suite of collagen AAs will occur under different environmental or dietary inuences. This work seeks to test this idea by situating CSIA-AA within a robust statistical framework using principal component analysis (PCA) and Bayesian statistics to increase the interpretability of a wider range of AA δ 15 N values in terms of reconstructing herbivore diet. The model was tested using wild and domestic herbivores from the Neolithic settlements of Çatalhöyük (Turkey), Makriyalos (Greece), and Vaihingen (Germany) as case studies. It was found that at Makriyalos there was a sharp separation between domesticated and wild herbivores, which was present to a lesser extent at Çatalhöyük and not observed at Vaihingen. The case studies presented in this work demonstrate that multivariate statistical treatment of CSIA-AA data can deliver new insights into herbivore diet, exceeding those achievable with the Bayesian model.


Introduction
Dietary reconstruction through isotope analysis is a powerful technique which can disentangle diets of individuals living in complex ecosystems. Isotope analysis of bulk tissues (referred to here as the 'bulk' method) has been used for reconstruction of animals diets to explore herding and foddering practices, as well as the interplay between crop cultivation and animal husbandry (Hamilton et al. 2009;Makarewicz 2014). This most often involves determining the stable carbon (C) and nitrogen (N) isotope values of the extracted organic components of bone (collagen) or dental remains (dentine) and using those isotope values for distinguishing between various diet sources for reconstruction. C isotope (δ 13 C) values can be used to distinguish between C 3 /C 4 plant diets (Farquhar et al. 1989) and N isotope (δ 15 (Bogaard et al. 2007; Styring et al. 2014b). Complicating the picture further, environmental variables also play a role. Plant tissues become 13 C-depleted due to CO 2 recycling under forest cover, lowering δ 13 (Craine et al. 2009) and favour C 4 plants over C 3 species. Indirect effects, such as salinity, can favour C 4 plant growth by restricting freshwater availability (Chmura and Aharon 1995). C 3 and C 4 plants have been found to show some differences in N isotopic values, with a slight increase in bulk δ 15 N isotope values of C 4 plants over those of C 3 plants by ca.
Attempts to resolve these limitations in bulk isotope analysis led to the development of compoundspeci c isotope analysis of amino acids (CSIA-AA). CSIA-AA involves breaking down peptide linkages between AAs, after which the isotopic compositions of individual AAs can be analyzed separately (Howland et  has been given to diets of past herbivores, which are important for understanding the interplay between crop and animal husbandry (through identifying foddering) and for gaining insights into herding strategies (Kendall et al. 2017). The majority of studies to date have focused on only three AAs by calculating the difference between Glx, the averaged amino N δ 15 N signal of glutamic acid and glutamine, and phenylalanine (i.e. Δ 15 N Glx−Phe ) which may be under-exploiting the potential resolution that can be achieved using this approach (O'Connell 2017).
To explore the untapped potential that likely resides within the full suite of protein AAs, we investigate correlations between multiple AA δ 15 N values in crops (cereal grains, rachis, legumes) and non-crops (herbaceous and woody plants). We use these correlations to differentiate between potential dietary sources and compare those dietary sources against trophic adjusted AA δ 15 N values of archaeological herbivores through principal component analysis (PCA). We then apply a Bayesian model (FRUITS) that is increasingly used in archaeological studies to reconstruct diet (Fernandes et al. 2014). A predominance of any dietary plant type would imply particular management strategies and subsequent ecological implications. The use of crops as fodder would imply close integration of crop and animal husbandry, whereas diets of woody plants might suggest exploitation of the surrounding woodland. To assess the performance of these statistical tools, we test using AA N isotopic data from modern cattle (Bos taurus) fed on known diets (Kendall et al. 2017) in addition to wild and domestic herbivores from the Neolithic sites of Çatalhöyük (Turkey), Makriyalos (Greece), and Vaihingen an der Enz (Germany) (Styring et al. 2015 (Fraser et al. 2013).
The aims of this study include: 1. Use of PCA to explore metabolic relationships between AAs and inform which ones will be most useful in resolving the contribution of dietary resources to herbivore diet,  Table S1). FRUITS is designed for use with bulk isotope data, some changes to the standard FRUITS con guration were made to accommodate CSIA-AA data. In addition to consumer and source AA isotope data, FRUITS utilizes ve parameters including proxies (consumer AA δ 15 N values), sources (plant type), source fractions, weights, and concentrations. The AA δ 15 N values of phenylalanine (Phe), proline (Pro), threonine (Thr), serine (Ser), and glycine (Gly) were selected from the loading plot to use as diet proxies because they were the least correlated variables (see Results and Fig. 1). Dietary sources are composed of grain, rachis, legumes, woody, and herbaceous plants. Source fractions (i.e., diet AA components) were based on all of the available AAs: alanine (Ala), valine (Val), leucine (Leu), Asx (averaged δ 15 N values of deamidated asparagine and aspartic acid), Gly, Thr, Ser, Pro, and Phe, which were determined from the plants.
The weighting describes the contribution of N from the source AAs in different plants to the proxy AA δ 15 N values derived from herbivores. For example, Phe is not biosynthesized, nor does it undergo reversible reaction with other AAs (Matthews 2007). Due to this, the source fraction Phe is weighted 100% to proxy Phe because the herbivore Phe signal is derived entirely from diet. Unlike Phe, the N in Pro is cycled between most other AAs via Glx because it is biosynthesized via a reversible ring closure from Glu  (Metges 2000) led to conservative estimation of source fraction Thr contributes to proxy Thr as 50%. Gly and Ser has been found to be signi cantly 15 N-enriched in cattle (Kendall et al. 2017) and are thus treated as a trophic amino acid in the same way as Pro. Given that there are some considerable ambiguities surrounding the precise AA contribution from diet versus de novo synthesis (directly by the consumer or via rumen microbes unique to ungulates), which may also vary depending on nutrition, species, and other variables, 20% error is assumed on the input values (for more detail, see Electronic Supplementary Material 2, ESM 2; Table S1).
Concentration describes the original abundance of the AA in those same sources. The AA concentration of different plant groups were determined by calibration with Nle as an internal standard and then averaged together (for details on internal standardization, see Styring  Tables S1, S2, and S3 for PCA, ESM 2; Tables S1 and S2 for FRUITS).

Results
Principal component analysis (PCA) PCA was performed in two parts. First, PCA was performed using unnormalized AA δ 15 N values of unmanured crops, woody and herbaceous plants to explore correlation between AAs through loading plots (Fig. 1). The loading plots of woody and herbaceous oral AA δ 15 N values showed that in general, Gly is less correlated with Glx than other AAs, with Ser falling between the two (Fig. 1). Thr is more correlated with Glx in woody plants but much less correlated in herbaceous plants. Pro is generally correlated with Glx but shows less correlation than the AAs Ala, Asx (combined N of aspartate and amino N of asparagine), Leu, and Val. The group of AAs that highly correlate with Glx in woody and herbaceous plants include Val, Leu, Asx, Pro, and Ala, which also corresponds with trophic AAs in mammals as de ned by O'Connell (2017). Furthermore, Phe is also consistently correlated with Glx in plants. In general, herbaceous plants had similar loading distributions to woody plants (Fig. 1). The differences primarily re ect the closer correlation of Phe, Thr, and Ser with Glx in woody compared to herbaceous plants. These variables are therefore likely important in separating herbaceous and woody plants.
The cereal parts rachis and grain were also analyzed ( Fig. 1). Due to the limited number of crop samples (n = 8 for grain and rachis), the AAs which consistently correlated with Glx in herbaceous and woody plants, i.e. Ala, Leu, and Val, were left out of the analysis. In rachis, Gly and Thr were least correlated with Glx, with Ser in between Gly and Glx. The remaining AAs Phe, Asx, and Pro were highly correlated with Glx. This pattern mirrored the AAs in herbaceous and woody plants. The AA correlations in grains were more unusual. Thr is uncorrelated with Glx as expected, but Gly and Ser were more correlated with Glx than in other plants, while Phe was less correlated with Glx than expected. Due to the unique AA correlations in grains, Gly, Ser and Phe can be expected to separate between grain and rachis, as well as herbaceous and woody plants. A loading plot for legumes is not available due to low sample count being insu cient for analysis (n = 4).
In order to reduce noise, only the most uncorrelated AAs in the loading plots ( Fig. 1) were used to generate the plants-only score plot (Fig. 2a). The score plot shown was therefore built with the δ 15 N values of Phe, Thr, Pro, Ser, and Gly. Normalizing values to Glx, i.e. δ 15 N Glx -δ 15 N AA , to remove the effect of environmental variables (e.g. manuring, wetland N enrichment) zeroed the Glx values; therefore, Glx was omitted as a variable. Instead Pro, which is well correlated with Glx in plants (Fig. 1) and in herbivores (Kendall et al. 2017), acts as a substitute. The legumes formed their own individual cluster with the two samples of pea nearly overlapping and separated from the two samples of broad bean (Fig. 2a). Rachis and grain also formed unique clusters. The herbaceous plants are generally separated from woody plants, but there is some overlap between the two categories, likely due to the considerable variety of species sampled (19 unique woody and 15 unique herbaceous species). Furthermore, the crops are shown to be more related to the herbaceous than the woody plants in PC1, with rachis being more closely related to herbaceous and woody plants than the cereal grains and legumes. Finally, the control cattle rmly overlapped with herbaceous plants as expected (Fig. 2b), which demonstrates the applicability of PCA for dietary reconstruction.
Food Reconstruction Using Isotopic Transferred Signals (FRUITS) Phe, Thr, Pro, Ser, and Gly were used as proxies in the FRUITS model to be comparable with PCA results.  (Table 1). Grain, rachis, and legumes were predicted to have contributed 10 − 20% of diet. The results clearly demonstrate some discrepancy between known diet and model predictions.  Fig. S1).

Çatalhöyük
For Çatalhöyük, the wild herbivores considered in this analysis are represented by a single species, aurochs (Bos primigenius, n = 5), while the domestic herbivores comprised individuals belonging to the Ovis and Capra genera (ovicaprids, n = 5). Both groups appeared to be more closely related to the herbaceous than woody plants in the score plot (Fig. 3a). The domestic herbivores appear to be slightly separated from the wild cattle on PC1, but there is some overlap.

Makriyalos
At Makriyalos, the wild herbivores considered here are represented by red deer (Cervus elaphus, n = 2), while the domestic herbivores consisted of sheep (Ovis sp., n = 5) and cattle (Bos taurus, n = 4). The domestic herbivores cluster tightly together without discrimination between cattle and sheep (Fig. 3b) and overlap with herbaceous plants. There is clear separation of the domestic and wild herbivores on PC2. It is possible that the two wild red deer had some dietary contribution from woody plant or rachis sources due to a slightly closer association with these groups.

Vaihingen an der Enz
At Vaihingen, the wild herbivores in the study are represented by red deer (n = 5), while the domestic herbivores consisted of cattle (n = 5). Both the domestic and wild herbivore groups show similar degrees of dispersion and appear to overlap most closely with herbaceous plants (Fig. 3c).

FRUITS predictions for archaeological herbivore diet
Uncertainties for the predicted diets of archaeological herbivores are large ( Table 1). The uncertainties associated with the predicted contributions of herbaceous and woody plants were particularly large compared to those of the crops, which is expected due to the large number of different species represented within non-crops. Increased variance associated with herbaceous and woody plants is also re ected in the raw data and the PCA plots (Fig. 2a). In all cases, crops of each category are predicted to have contributed no more than 20% of the total diet. Furthermore, total diet contribution from crops were combined and compared against contribution from non-crops. Most herbivores were found to have higher total diet contribution from crops rather than noncrops (6-12%), except sheep at Makriyalos (-12%) and modern cattle fed on controlled pastures (-14%).
Limitations of FRUITS results are further elaborated in the discussion.

Discussion
The animals originally selected for CSIA-AA from each site had similar bulk collagen N isotope values close to the mean average for each site (ESM 3; Table S1). Consequently, the results of this study must be interpreted as the likely diet of a particular group of animals from each site, which probably had access to similar plant resources and, in the case of the domestic herbivores, had been managed in similar ways to others in the assemblage.
Exploring the relationships between AA δ 15  be highly correlated with that of Glx, which is found to be the case for Ala, Asx, Leu, Pro, Val and Phe (Fig. 1). A decreased degree of correlation between an AA and Glx is likely only to result when there are many intermediate steps to biosynthesis or when there are partially self-sustaining recycling mechanisms.
Gly and Ser are involved in one such recycling pathway. The SHMT-SGT pathway uses two molecules of Gly to produce one molecule of Ser. Excess Ser is in turn recycled into molecules of Gly. This pathway produces a de cit of one molecule of Gly if Ser is recycled from excess Gly. The Gly de cit can be balanced by biosynthesis from Glu (Bauwe et al. 2010). This is likely to explain why Gly and Ser can be less correlated with Glx than many of the other AAs (Fig. 1). Previous studies have indicated that increased biosynthesis of lignin via the phenylpropanoid pathway can lead to elevated Phe δ 15 N values (Kendall et al. 2019). This phenomenon has been suggested to be a distinguishing factor between cereal grains and rachis as well as herbaceous and woody plants, since rachis and woody plants contain more lignin than grains and herbaceous plants (Kendall et al. 2019). In Fig. 1, Phe is less correlated with Glx in herbaceous plants compared with woody plants. Increased demand for Phe in woody plants for lignin production would lead to higher levels of Phe biosynthesis from Glu, thereby increasing correlation with Glx. Interestingly, the consistent correlation of Phe with Glx in plants (Fig. 1)  Lack of consistent correlation of Thr with Asx ( Fig. 1) could be explained due to conversion of Thr to Gly through threonine aldolase (Joshi et al. 2006), the activity of which may be highly variable between species. Our loading plots of modern reference plant AA δ 15 N values (Fig. 1) is therefore consistent with known AA metabolic pathways and validates the differences in metabolism between various different plant categories under analysis.
These differences in AA metabolism between the different plant categories means that they are reasonably well-separated in a score plot (Fig. 2a). While a loading plot could not be generated for the legumes, the patterning of their AA δ 15 N values differs signi cantly from that of the other plant categories (ESM 1; Table S1, Styring et al. 2014a), allowing them to be separated from the other plants in Fig. 2a.
Assessing the potential and current limitations of using AA δ 15 N values and FRUITS to reconstruct herbivore diet Modern reference cattle overlap with herbaceous plants in Fig. 2b as expected in the score plot. However, while the overlap demonstrates a strong relationship between herbaceous plants and consumer diet, FRUITS was utilized to provide a quantitative estimate of percentage contribution from various dietary sources. The known diet of the control cattle from North Wyke Farm Platform was 100% C 3 herbaceous plants, which is not what the FRUITS model predicts (Table 1).
Given the evident limitation in applying the FRUITS model, a combined percent contribution of crops (grain, rachis, legume) and non-crops (herbaceous and woody plants) was calculated in Table 1. The score plot (Fig. 2a) shows su cient separation of crops and non-crops, which justi es this separate categorization. In the case of control cattle, this leads to a predicted 57% contribution from non-crop sources and 43% from crops. A 57% contribution of non-crops to the diet remains signi cantly lower than the actual contribution of 100%. This error can be attributed to the large standard deviations in the source data (as illustrated by data dispersion in Fig. 2a). Moreover, given the large uncertainties, it is unsurprising that there is insu cient statistical certainty to assign 0% contribution to crops. If some value (such as 10%) must be assigned to a category due to statistical uncertainty, even when the real contribution is 0%, then this error implies overestimation of crop contribution in Table 1, given that there are three categories of crops and only two of non-crops. Since the same source plant values were used when analyzing the unknown data, the predicted diets of control cattle outline the limits to which FRUITS results can be interpreted. Moreover, applying FRUITS to CSIA-AA data necessitates estimating percent contribution of different source AA amino groups to biosynthesized AAs found in bone collagen. Without a greater understanding of AA cycling through the 'metabolic pool' (O'Connell 2017), which may vary between species and can be in uenced by rumen microbes, it is di cult to achieve more precision. While uncertainty can be reduced with priors in Bayesian models, use of bad priors can have signi cant deleterious consequences for decreasing model accuracy, while giving the appearance of increasing precision (Cheung and Szpak 2020). Given these limitations and the relative performance of PCA and FRUITS in predicting the diet of control cattle, PCA results should be emphasized over FRUITS.
Using CSIA-AA to reconstruct herbivore diet at three Neolithic archaeological sites

Çatalhöyük
The overlap of domestic and wild herbivores with herbaceous plants in the score plot (Fig. 3a) is consistent with current interpretations of animal management strategies at Çatalhöyük. Current evidence suggests that the Konya plain surrounding Çatalhöyük was a dryland anabranching river system in the Neolithic (Ayala et al. 2017) and strontium isotope analysis of sheep tooth enamel suggests that sheep were likely to have been herded on this plain rather than in the more distant upland region . Aurochs, like their domestic counterpart, are primarily grazers (Gordon and Prins 2008). Therefore, in a region such as the Konya plain, it is reasonable to assume that herbaceous plants  Table S1).

Makriyalos
The domestic and wild herbivores from Makriyalos form two distinct groups in the score plot (Fig. 3b). This might suggest tightly controlled herding of domestic herbivores at Makriyalos, meaning that they had access to plants that were distinct from those available to the red deer. The domestic herbivores overlap mostly with herbaceous plants, while the red deer plot closer to woody plants and rachis. These results support the possibility of herding of domestic herbivores in local saline marshes, as suggested in Vaiglova et al. (2018). The more mixed diets of the two red deer individuals are consistent with known feeding patterns of red deer. Contrary to most deer species, the red deer is not predominantly a browser.
Red deer obtain as much as a third of their diet from grasses with the rest made up of concentrate foods which include items such as leaves, shrubs, and seeds (Gebert and Verheyden-Tixier 2001). In regions with sparse woodlands such as the plains surrounding Makriyalos, grasses and herbs can make up a signi cant portion of diet, so a large herbaceous plant contribution to red deer diet is not unexpected.

Vaihingen an der Enz
There is no distinct separation of domestic and wild herbivores in the score plot for Vaihingen (Fig. 3c), and both overlap with herbaceous plants.

Conclusion
The case studies referenced in this paper are restricted by use of historical data, but these results demonstrate that multivariate approaches using many AAs can allow for better visualization and extraction of information from CSIA-AA data than relying solely on raw values or the Δ 15    Vaihingen includes domesticated cattle (n = 5) and wild red deer (n = 5).