Diverse yet selective tuning of an odorant receptor for sensing four classes of musk compounds

doi:10.21203/rs.3.rs-1916850/v1

Download PDF

Research Article

Diverse yet selective tuning of an odorant receptor for sensing four classes of musk compounds

https://doi.org/10.21203/rs.3.rs-1916850/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Musk was originally identified in male musk deer and other mammals to mark territories and attract females. In humans, musk compounds are widely used in perfumes and consumer products for their superior perceptual odor quality. Strikingly diverse natural and synthetic chemicals have similar “musky” odor, which has resulted in diverse models of musk odor perception and has raised questions regarding simplistic associations between chemical features and odor quality. Scientists’ lack of understanding of this principle has hampered the design of a novel musk compound. Here, we functionally identified the odorant receptor, OR5A2, as a receptor for the musky odor of diverse musk compounds. First, we discovered that engineered OR5A2 with enhanced expression in heterologous cells is sensitive and selective to musk compounds in all four structural classes. Second, the clarified functional variation of OR5A2 accounts for the reported association between genetic variation and perception in a musk compound. Finally, the revealed ligand selectivity of OR5A2 provides insight into developing a trained model for machine learning-based virtual screening of candidates for a new musk compound. We propose that OR5A2 contributes to the long-sought gateway for sensing musk compounds and generating its unique odor quality.

Cellular & Molecular Neuroscience

Musk

Olfaction

Odorant

Perfumery

Fragrance

Receptor

Machine learning

The musk compounds have played a key role in the creation of many perfumes over the past two centuries^1–5. While the name musk comes from the musk gland in musk deer, the practice to extract musk from animals is banned and musk compounds are mostly synthetic. Musk compounds are currently classified into four distinct structural classes (Figure 1A): the macrocyclic musks (MCMs), deriving originally from natural compounds; the nitromusks (NMs), which were the first to be commercialized; the polycyclic aromatic musks (PCMs), which have outstanding stability; and the alicyclic musks (ACMs) that were discovered at the end of the 20th century as the fourth generation of musk compounds^2,3,5,6. Although identifying these musk compounds has led to commercial success and reinitiated a search for novel structures, the principle behind the variable chemical structures of the compounds eliciting a musky odor has remained a mystery. This lack of understanding hampers the development of a novel musk compound with greater efficiency in terms of perception, price, safeness, and sustainability.

A conceivable hypothesis to explain why the chemically diverse compounds elicit the same musk sensation is that musk compounds are recognized through one or a small number of common olfactory receptor(s) (ORs) that mediates the perceptual odor quality of musky. The human genome equips approximately 400 genes encoding ORs, each of which presumably recognizes a unique subset of odorants with distinct affinity^7–16. Because of difficulty performing functional assays using heterologous cells, only 12% of human ORs have been matched with their cognate ligand^13,17. This includes OR5AN1, which is the first identified human OR that specifically detects some musk compounds. OR5AN1 is activated by MCMs and NMs but not by other musk compounds like PCMs and ACMs^11,18–20. This restricted receptive range of OR5AN1 suggests the possibility of additional receptor(s) that sense the remaining musk compound classes. Additionally, recent studies highlighted candidates for musk receptor(s) in humans by taking advantage of perceptual variations among people and genetic polymorphisms of ORs^13,21. These studies revealed individual differences in intensity perception of a PCM, galaxolide 14, associated with genetic variations located in chromosome band 11q12.1, including five ORs: OR5AN1, OR5A2, OR5A1, OR4D6, and OR4D10. However, causal OR(s) are unclear without functional proof of these associations.

Design of Consensus ORs

First, we conducted a screening experiment to search for another musk receptor. HEK293 cells were transfected with each of 417 human ORs and stimulated with four musk compounds (Figure S1). Activation of OR5AN1 was consistently monitored when stimulated with two MCMs, globanone^TM 4, or muscenone^TM delta 6. However, ambrettolide 11 as another MCM and galaxolide 14 did not activate any ORs, including a major haplotype of ORs located in chromosome band, 11q12.1. These negative results were likely because the OR(s) for ambrettolide 11 and galaxolide 14 were not functionally expressed in heterologous HEK293 cells.

A previous study reported that ligand selectivity of orthologous ORs is conserved among mammalian species²². Subsequently, the evolutional consensus strategy that led to functional expression of an OR in a heterologous cell¹³. We applied the same strategy to the five ORs located in chromosome band 11q12.1. For example, in order to design a consensus version of OR5A2, amino acid alignment of the 111 mammalian OR5A2 and human OR5A2 was conducted to identify the consensus amino acid residues, each of which were conserved across over 50% of proteins at a given site but were not conserved in humans (Figures 1B and 1C; see Supplementary text files for details of amino acid alignment). Introduction of the consensus amino acids to human OR5A2 resulted in a consensus version of OR5A2 with 92% amino acid identity with the original human OR5A2. Using the same strategy, consensus versions of other four ORs located in chromosome band 11q12.1 were also designed (Table S1).

We investigated the functional properties of these engineered ORs in regards to cell surface expression, selectivity, and sensitivity to agonists in comparison to their original proteins. First, changes in cell surface expression levels of ORs were evaluated using flowcytometry via detecting their N-terminal FLAG tags (Figures 1D and 1E). Fluorescent signals from original versions of OR5AN1 (a reference gene with Leu289 and a genetic variant with Phe289) and OR4D6 were detected on the surface of HEK293 cells when compared with the signal from cells transfected without any OR. No cell surface expressions were detected from OR5A2, OR5A1, or OR4D10-expressing cells. Consensus versions of all five ORs showed cell surface expression. Secondly, the five ORs included functionally well-characterized ORs, OR5AN1, and OR5A1, which allowed us to evaluate the effectiveness of the consensus strategy based on responses to known agonists. When expressed in HEK293 cells, both the consensus version and the original sequences of OR5AN1 responded to NMs and macrocyclic ketones of MCMs (1-8) (Figure 2). The improvement of general responsiveness of consensus OR5AN1 allowed detection of previously reported weak agonists, cyclopentadecanol 9, and ethylene brassylate 10^11,19. Dose-response analysis of consensus OR5AN1 validated its improved sensitivity (Figure S2). OR5A1 was previously demonstrated as a causal receptor for b-ionone perception^23,24. Applying consensus amino acid residues improved detection of a response to b-ionone 21 as well as ambrinol 22 (Figure 2). Thus, introduction of evolutionally conserved amino acid residues into human ORs did not alter their distinct ligand selectivity but allowed improved measurements of OR activations to ligands, including physiologically relevant ones as demonstrated for OR5A1 and previously OR6Y1¹³.

The Receptor for Four Classes of Musk Compounds

We screened another musk receptor from a consensus version of five ORs located in chromosome band 11q12.1 and identified OR5A2 (Figure 2). HEK293 cells expressing the consensus OR5A2 (hereafter, cOR5A2) responded significantly to odorants 1-20 at concentration of 100 mM, which include all of tested traditional musk compounds from four classes (1-8, 10-18, Figure 1A). No responses were observed in cells with the original OR5A2 (Figure 3). In contrast, cells expressing consensus versions of OR5A1, OR4D6, and OR4D10 did not respond to any musk compounds. Subsequent dose-response analysis validated their activity of the 20 odorants to cOR5A2 based on the reported statistical criteria (Figure 3 and Table S2)^9,25. None of the other compounds (21-48) activated cOR5A2 despite having partially similar structures to the tested musk compounds. Furthermore, we detected no activation of cOR5A2 to 45 non-musk odorants in eight mixtures (49-56), demonstrating high selectivity of cOR5A2 to musk compounds. Consistent selectivity to musk compounds was monitored when four consensus residues of cOR5A2 near the predicted ligand-binding pocket were substituted with those of human OR5A2 (Figure 2 and Figure S3).

Despite the reported associations^13,21, our functional assay did not detect activation of OR4D6 and OR5A1 nor of their consensus versions to galaxolide 14 (Figure S1 and Figure 2). A missense variant of OR5A2 (rs1453547 in Table S3) was associated with lower intensity ranking with galaxolide 14, suggesting the possibility that OR5A2 and its variant of P172L were sensitive and insensitive to galaxolide 14, respectively. This hypothesis was dovetailed perfectly with the result of functional assay using the consensus version of OR5A2. cOR5A2 with Pro172 was more sensitive to galaxolide 14 than that with Leu172 (Figure 3). The correlation between OR5A2 responsiveness and previously reported perceived intensity of galaxolide 14 provides the first functional evidence that OR5A2 is a causal receptor for perception of the musk compound.

The substitution of leucine at position172 of cOR5A2 led to a drastic loss of sensitivity against all tested musk compounds (Figure 3). In contrast, genetic variation of OR5AN1 showed only a modest difference in sensitivity to macrocyclic ketones and a NM (Figure S2). Therefore, humans who have OR5A2 P172L variant may be insensitive not only to galaxolide 14 but also to other OR5A2-specific musk compounds. This may partly explain the well-known specific anosmia for musk compounds^26,27. Our findings appear to be consistent with previous non-peer reviewed reports disclosing a function of original sequence of OR5A2^28,29.

The unveiled ligand selectivity of cOR5A2 allows us to discuss the identity that the perceptual odor quality OR5A2 mediates. The tested 48 compounds were classified into 20 agonists and 28 non-agonists for cOR5A2. As mentioned above, the 20 agonists cover all 17 traditional musk compounds in four classes (1-8, 10-18)^2,4. Among the remaining three agonists, cyclopentadecanol 9 and ω-dodecanolactam 19 were not commonly used as perfumery raw materials and therefore, there is a lack information about their odor characters. The weakest agonist, amber xtreme^TM 20, is registered with the odor descriptor of musky in the public odor database (Table S4)³⁰, though its major odor character is ambery as the compound name indicates². In contrast, 28 non-agonists do not include any well-recognized musk compounds. Among the 28 non-agonists, seven are registered with musky as one of odor descriptors in the public database (ambrinol 22, Ysamber^TM K 23, amberketal 25, ambroxan^TM 36, methyl cedryl ketone 38, p-t-butyl cyclohexanone 41, and sandalmysore core^TM 47)^30–33. However, the main odor characters of these odorants were known as amber and/or woody rather than musky^2,30. Therefore, traditional musk compounds in the four classes generate the musky odor perception most likely through OR5A2, although they may also activate other OR(s), which mediates other odor characters such as amber and woody. This results in associative learning between musky and related odor characters, creating a blurry boundary between them in sensory evaluation. Future studies using our consensus strategy may allow us to define the odor quality of musk compounds based on a larger number of ORs. Nonetheless, we propose OR5A2 as a promising candidate to draw a border between odorants with or without a unique perceptual odor quality of musk compounds.

cOR5A2-based Machine Learning Model for Predicting Musky Compounds

The axes defining odorants with or without musky odor have been unknown. The present study proposes a new potential boundary of them: agonist or non-agonist for cOR5A2. If the boundary is generalizable for more compounds, a machine learning (ML) model trained by structural information of the agonists and non-agonists allows classification of a wide range of musky compounds, and interpretation of the decision-making process provides understanding for the structural definition of musky compounds.

Interpretable MLs were conducted with the dataset containing 20 agonists and 27 non-agonists of cOR5A2 (Table S4). Structural information of these 47 compounds was represented as 2048 bits of substructures by the count-based Morgan fingerprint in radius 2. These 2048 bits were narrowed down based on a regression analysis for agonistic activity of the 47 compounds, resulting in 14 bits of substructures (Figure 4A). We compared the prediction performances of eight classifiers trained with the dataset. As a result, the Naïve Bayes (NB) model achieved the best performance (Figure 4B, Table S5).

We then tested whether the high-confidence prediction by the NB model was applicable to more musky compounds registered in the public odor database^30,34. This database provides the data on 3924 odorants with odor type descriptions and the simplified molecular input line entry system (smiles). The 3924 odorants include 79 with odor descriptor of musky. The trained ML model could only maintain the promised performance when it was used to predict similar spaced compounds from the trained dataset^35–37. Therefore, we calculated the Tanimoto Similarity (TS) based on the extended connectivity fingerprint of the 47 trained compounds and extracted 244 compounds with the maximum TS score > 0.5 (Figure 4C). Moreover, 20 compounds were omitted because they were used for training the model. This process resulted in a test dataset of 224 compounds, which contained 27 musky compounds and 197 non-musky compounds.

Against the test dataset, the NB model maintained its good performance on all the test data based on its Accuracy and ROC_AUC score of 96.0% and 97.7% (Figures 4C and 4D, Table S6), respectively. Of the 36 compounds predicted as musky compounds, 27 were actually known as musky odorants in the database, while the other nine compounds were not (Precision=75%). Eight out of the nine false positive compounds, which were commercially available, were subjected to follow-up experiments and validated to have no agonistic activity to cOR5A2 (Figure S4). All of the 27 musky compounds in the test dataset were successfully predicted (Recall=100%). The model for predicting the positive class also achieved promising performance with the F1 score of 0.857. The ROC curve bowed close to the coordinate of (0,1), providing evidence that the NB model learned important patterns well and was able to precisely distinguish musk and non-musk compounds with various structures (Figure 4E).

The relative importance of each bit structure in the decision-making process was interpreted based on SHAP mean value (Figure 4F)³⁸. In addition, LRC analysis revealed that all of the important bit structures were found to be positively correlated with the target, indicating that a compound that contains the top ranked substructures would have higher probability to be predicted as positive by the NB model (Figure 4G). Bit structures #89 and #94 represented the substructures of NMs identified as agonists for cOR5A2. Bit structures #40, #147, and #155 were represented as a macrocyclic ring with a carbonyl group, an ether group and a carbonyl group, or a methyl group, respectively. They also demonstrated the characteristics of the MCMs. Moreover, bit #207 and #19 represented substructures of PCMs. Bit #113 corresponded to the substructure of ACMs. These bits/substructures constituted an axis to define musky compounds with diverse structures.

Previous studies demonstrated that chemical features of ligands of ORs allowed for the development of ML models capable of predicting new ligands^35–37. Our NB model performed better to classify agonists and non-agonists for cOR5A2 based on the cross-validation outcome; the F1 score in our NB model was 0.974, whereas previous studies resulted in a score below 0.80³⁶. More importantly, the predicted agonists for cOR5A2 from the public database dovetail well with musky compounds, providing the first example of a ML model that can predict the odor type of a given compound based on ligands of a single OR. The difficulty in developing such a ML model can be explained by the olfactory principle that each perceptual odor type, in general, is considered to be mediated by multiple ORs³⁹. However, the exceptional molecular mechanism of musky perception, which is elicited through a smaller number of ORs^11,18, provided a rare opportunity to realize the model. Our results not only present a model with efficient performance to predict musky compounds, but also the importance of OR5A2 to draw a border between odorants and musky compounds with diverse structures.

Despite the lack of a straightforward research methodology, the past two centuries of research have provided outstanding success with discoveries and syntheses beyond the fourth generation of musk compounds². However, the principle underlying the mechanism by which diverse structures of musk compounds generate a common musky odor quality has been a long-standing mystery for both chemists and biologists, with a fundamental question of odor-quality-coding in olfaction. The present study offers an answer to this mystery along with a technological advance in olfactory research. An efficient functional assay system of ORs was established based on the consensus strategy, which did not alter functional characteristics of original ORs but also demonstrated sensitive detection of odorant-induced responses. The reconstitution of ORs with consensus amino acids in HEK293 cells proved a predicted function and unveiled further characteristics of OR5A2 for commonly but selectively sensing diverse structures of musk compounds. This consensus strategy will allow us to identify ORs for odorants of interest and to study the mechanisms of how odor qualities are coded by hundreds of ORs. The identified long-sought principle, a musk receptor OR5A2, paves the way for receptor-based generation of musk compounds with more efficient performance.

Acknowledgements

The authors thank Mari Kobayashi for preparing plasmids and for conducting dose-response analysis of ORs; Koichi Murayama for information of odorants used in this study; Emily Xu, Priyanka Meesa, and Ryoichi Matsui for comments on this manuscript; and members of Sensory Science Research lab and Material Science Research for helps and discussions.

Author Contributions

K.Y. conceived the project, designed, and performed all experiments, and wrote the manuscript. J.D. designed the experiment related to Figures 2-4, and wrote the manuscript. H.J. constructed machine learning models, conducted model interpretations, virtual screening, and wrote the manuscript. H-Y. L. designed cOR5A2 with humanized ligand binding pocket. H.M. conceived the project, designed experiments, and wrote the manuscript.

Declaration of Interests

K.Y., J.D. and H.J. are employees of Kao Corporation. These do not alter the authors’ adherence to all policies on of the journal sharing data and materials. Kao has applied for patents related to cOR5A2. There are no products in development or marketed products to declare. H.M. has received royalties from ChemCom, research grants from Givaudan, and consultant fees from Kao Corporation.

Odorants

The musk odorants used in this study were provided by Takasago International Corporation or were purchased from Wako or TCI (Table S4). The musk odorants, their structural analogues, and odorant mixtures used in this study are as follows: 1. musk ketone, 2. musk xylol, 3. civettone, 4. globanone^TM, 5. cyclopentadecanone/exaltone (Firmenich), 6. muscenone^TM delta, 7. muscone, 8. cosmone^TM, 9. cyclopentadecanol, 10. ethylene brassylate, 11. ambrettolide, 12. habanolide^TM, 13. pentalide, 14. galaxolide, 15. tentarome/tonalid^TM, 16. celestolide, 17. helvetolide, 18. romandolide^TM, 19. ω-dodecanolactam, 20. amber xtreme^TM, 21. b-ionone, 22. ambrinol, 23. Ysamber^TM K, 24. p-cresyl phenyl acetate, 25. amberketal, 26. d-dodecalactone, 27. cyclopentadecane, 28. cyclododecanone, 29. 2-pentadecanone, 30. Calone, 31. heliotropyl acetone, 32. 8-pentadecanone, 33. florex^TM, 34. styrallyl acetate, 35. β-damascone, 36. ambroxan^TM, 37. isolongiforanone, 38. methyl cedryl ketone, 39. 6-methyl quinoline, 40. raspberry ketone, 41. p-t-butyl cyclohexanone, 42. Kephalis, 43. Amber Core^TM, 44. (R)-(+)-sclareolide, 45. iso e super, 46. boisambrene^TM forte, 47. sandalmysore core^TM, 48. methyl b-naphthyl ketone, 49. aldehyde mixture (n-hexanal, n-octanal, 1-decanal, 2-methyl undecanal, melonal, and helional), 50. alcohol mixture (cis-3-hexenol, 1-dodecanol, 1-decanol, 1-octanol, hexanol, and geraniol), 51. ketone mixture (methyl heptenone, p-amyl cyclohexanon, acetophenone, claritone, and fructone), 52. acid mixture (phenyl acetic acid, acetic acid, benzoic acid, hexanoic acid, ocrtanoic acid, and iso-valeriec acid), 53. terpene mixture (l-carvone, l-menthol, iso-menthone, α-terpinene, camphene, and dipentene), 54. lactone mixture (g-decalactone, g-nonalactone, g-undecalactone, jasmolactone, g-valerolactone, and g-octalactone), 55. ester mixture (methyl cinnamate, benzyl propionate, iso-bornyl acetate, cis-3-hexenyl hexanoate, iso-amyl n-butyrate, and phenyl ethyl acetate), and 56 camphor mixture (1,4-cineol, iso-borneol, borneol, and camphor). Musk ketone and musk xylol were prepared as a 50 mM EtOH solution. Each of the other odorants were prepared and stocked as 100 mM EtOH solution. These stock solutions were stored in a freezer and diluted with medium for stimulation. The odorant mixtures (#49-56) were prepared as EtOH solutions and applied to cells with each component’s final concentration as 100 mM.

Designing Consensus ORs

BLASTP searches were conducted using a reference amino acid sequence of a target OR as a query against proteins in the database of reference proteins. As a result of the searches, the number of identified orthologous receptors without human receptors were as follows: 242 for OR5AN1, 196 for OR5A1, 111 for OR5A2, 134 for OR4D6, and 102 for OR4D10, (Supplementary text files). Amino acid alignment using ClustalW allowed identification of consensus amino acid residues, which were conserved across more than 50% orthologous receptors at a given site. A consensus amino acid was inserted into a human OR when more than 60% orthologous ORs had an amino acid at the position. On the other hand, an amino acid was deleted from a human OR when more than 60% orthologous receptor did not have an amino acid at the corresponding position. The consensus amino acid sequences were translated into DNA sequences through codon optimization. The DNA sequence of each OR was synthesized by GenScript Japan and inserted into a pME18S vector to generate OR proteins fused with N-terminal epitope tags of eight amino acids of FLAG tag, followed by twenty N-terminal amino acids of bovine rhodopsin.

Expression Vector

The sequence information of human ORs used in this study was reported previously^10,40. Genes coding human ORs, trace amine-associated receptors (TAARs), and vomeronasal receptors type I (VN1R1s) were amplified from human genomic DNA (Promega, Madison, WI, USA). The identified single nucleotide polymorphisms (SNPs) that were different from the reference sequences were not modified. When we amplified an OR gene with unknown missense mutations, we modified the genes with mutations to reference sequences. The human RTP1S gene was amplified from human genomic DNA (Promega, Madison, WI, USA), and it was inserted into pME18S without any N-terminal epitope tag. cDNA of muscarinic acetylcholine receptor M2 was purchased from Thermo Fisher Scientific (Waltham, MA, USA), and the ORF was inserted with pME18S, containing both the N-terminal epitope tags of FLAG and twenty amino acids of bovine rhodopsin.

Cells

HEK293 cells were grown in Dulbecco's Modified Eagle Medium (DMEM, 4.5 g/L glucose, Nacalai Tesque, Kyoto, Japan) supplemented with 10% fetal bovine serum (FBS). Cells were cultured on a 100-mm cell culture dish at 37 °C in a humidified atmosphere containing 5% CO₂. Cells were split every two to four days before reaching confluence. When passaging cells, all medium in the cell culture was removed, and cells were gently washed with PBS. Trypsin-EDTA (0.25%, Invitrogen, Waltham, MA, USA) was used to detach cells from the bottom of the dish. Equal volumes of DMEM with 10% FBS were applied to the dish immediately after the cells detached. The cells’ suspension was transferred to a 15 ml polypropylene tube and centrifuged for 1 minute at 200g at room temperature. DMEM and trypsin-EDTA were aspirated, and the retained cell pellet at the bottom of the tube was resuspended with DMEM with 10%. The cell suspension was transferred to a 100 mm cell culture dish with 9 ml DMEM and 10% FBS.

Flow Cytometry Analyses

HEK293 cells were grown to confluency, resuspended, and seeded onto 35-mm cell culture dish or 6-well plate. The number of seeded cells were 3.6×10⁵in 2 ml DMEM with 10% FBS. The cells were cultured overnight before transfection. The DNA transfection mixture in 100 μl DMEM contained 3 mg of FLAG-Rho-tagged OR, 1 mg RTP1S, and 10 μl polyethylenimine Max (PEI-MAX, 0.1%, pH 7.5, Polysciences, Warrington, PA, USA). For preparing cells without any receptors (mock-transfected cells) or M2AchR-expressing cells, 3 mg of empty vector or FLAG-Rho-tagged M2AchR was transfected instead of a FLAG-Rho-tagged OR, respectively. After incubation at room temperature for 15 min, the DNA transfection mixture was transferred to the cell culture dish. After 24 hours, the cells were detached and resuspended with 1 ml cell stripper (Corning, NY, USA) and 1 ml PBS with 2% FBS on ice. The cells in 15 ml polypropylene tubes (AGC TECHNO GLASS, Shizuoka, Japan) were centrifuged for 3 minutes at 200g at 4 °C. The cell pellet at the bottom of the tube was resuspended with 120 ml of 0.3 mg/ml primary antibody [Anti-DYKDDDDK, Mouse-Mono (2H8, transgenic, Fukuoka, Japan)] in PBS with 2% FBS and incubated for 60 minutes on ice. At the end of the incubation period, the cells were washed with 1 ml PBS and 2% FBS twice and stained with 100 ml of 50 mg/ml phycoerythrin (PE)-conjugated goat anti-mouse IgG H&L antibody in PBS with 2% FBS for 30 minutes on ice (FUJIFILM Wako Pure Chemicals, Osaka, Japan) in the dark. Dead cells were stained with 0.5 mg/ml, and 7-Amino-actinomycin D (FUJIFILM Wako Pure Chemicals, Osaka, Japan) was added. The cells were analyzed using BD FACSuit with gating allowing for single, spherical, viable cells, and the measured PE fluorescence intensities were analyzed and visualized. We normalized the cell surface expression levels by cells expressing FLAG-Rho-tagged M2AchR, which was robustly expressed on the cell surface, and mock-transfected cells. The normalized expression level was calculated as [PE (OR) − PE (mock)] / [PE (M2AchR)- PE (mock)], where PE (OR) = mean PE from cells transfected with an OR; PE (mock) = mean PE from cells without any receptors; and PE (M2AchR) = mean PE from cells transfected with FLAG-Rho-tagged M2AchR. Statistical analyses were performed using one-tailed t-test, and the significance level was set at P<0.05.

cAMP response Element (CRE)-regulated Luciferase Reporter Gene Assay

The Dual-Glo Luciferase Assay (Promega) was used to determine the activities of firefly and Renilla luciferase in HEK293 cells. Transfection and luciferase assays were performed as previously described^10,11. Briefly, 75 ng of a FLAG-Rho-tagged OR pME18S, 30 ng of CRE/luc2PpGL4.29, 30 ng of pRL-CMV, and 30 ng of RTP1S pME18S were applied in 10 ml DMEM with 0.41 ml of PEI-MAX (0.1%, pH 7.5) for each well of a poly-D lysine-coated 96-well plate (Corning, NY, USA). For testing cOR5A2 with humanized ligand-binding pocket, 7.5 ng of a FLAG-Rho-tagged OR pME18S, 3.0 ng of CRE/luc2PpGL4.29, 30 ng of pRL-CMV, and 30 ng of RTP1S pME18S were applied. After incubation for 15 min, 90 µl of cell suspension (2 x 10⁵ cells/cm² in DMEM with 10% FBS) were added to the 10 µl transfection solution, and the plate was incubated for 24 hours. For each well of poly-D lysine-coated 384-well plate (Corning, NY, USA), 29 ng of a FLAG-Rho-tagged OR pME18S, 22 ng of CRE/luc2PpGL4.29, 11 ng of pRL-CMV, and 12 ng of RTP1S were applied in 4.4 ml DMEM with 0.16 ml of 0.1% PEI-MAX (0.1%, pH 7.5). After incubation for 15 minutes, 40 µl of cell suspension (2 x 10⁵ cells/cm² in DMEM with 10% FBS) was added to 4.4 µl of the transfection solution, and the plate was incubated for 24 hours. The 384-well plate assay was performed using the BiomekFX laboratory automation system (Beckman Coulter, Brea, CA, USA). After 24 hours of transfection, the medium was removed, and the transfected cells were stimulated with an odorant solution diluted in the DMEM (75 µl and 40 µl per well for 96-well plate and 384-well plate, respectively). The 96- or 384-well plates were sealed and incubated at 37 °C for 3-4 hours. The luciferase reporter gene activities were measured by Mithras LB940 (Berthold Technologies, Bad Wildbad, Germany) and by Ensight multimode plate reader (PerkinElmer, Waltham, MA, USA). An odorant-induced activity was calculated as fold increase or normalized response. Fold increase was calculated as Luc(N) divided by Luc(0), where Luc(N) was the luminescence intensity of firefly luciferase divided by the luminescence intensity of Renilla luciferase of a certain odorant-stimulated well, and Luc(0) was the luminescence intensity of firefly luciferase divided by the luminescence intensity of Renilla luciferase of a certain non-stimulated well. Normalized response was calculated as [Luc (N) – Luc (0)] / [Luc (Fk)- Luc (0)]x100, where Luc (Fk) was the luminescence intensity of firefly luciferase divided by the luminescence intensity of Renilla luciferase of a forskolin-stimulated well (30 mM, LKT Laboratories Inc. Phalen Blvd., MN, USA). In a dose-response analyses, the reported three statistical criteria were applied: (i) 95% compatibility intervals (CIs) of top and bottom parameters don’t overlap, (ii) the standard deviation of logEC₅₀ is less than 1, and (iii) the extra sum-of-squares test to examine that the odorant-response curve is different from the empty-vector curve (P<0.05)^9,25. We confirmed that the EC₅₀ values calculated from the sigmoidal curves with the normalized response values did not differ by more than 1 log step from those with raw firefly luciferase values⁸. Data analysis was performed using Microsoft Excel or GraphPad Prism software, and data were fitted to a sigmoidal curve using GraphPad Prism software.

Construction of cOR5A2 with humanized ligand binding pocket

The structural construction of cOR5A2 and human native OR5A2 is generated using open source Alphafold (version 2.1.0) developed by Deepmind. The full amino acid sequence is fed into the alphafold model trained on the full genetic database including, BFD, MGnify, PDB70, PDB, Uniclust30 and UniRef90. From the generated pdb structure, the two proteins are centered and rotated with kabasch method to minimize the RMSD for structural comparison. The ligand binding pocket is manually labeled by taking the extracellular loop 2 and the top of the transmembrane domain 3, 5, 6 and 7 above the FYG motif of TM6. Calculations of distance, directionality and angles are calculated for every amino-acid relative to the binding pocket center. Here, the binding pocket center of the binding cavity is obtained by the common centroid function. Distance is simply calculated by subtracting the longest carbon chain position of the amino acid (CX) by the center of the binding pocket. The angle of relative amino acid is obtained by finding the angle degree created between the two vectors of alpha carbon (CA) to CX on the amino acid and CA to binding pocket center. All data pre-processing and analysis were conducted using open source Python package pandas (version 1.2.1), and numpy (version 1.19.5).

Data Introduction and Pre-processing for Machine Learning

The dataset containing 20 agonists and 27 non-agonists against a consensus version of OR5A2 was used to train and evaluate the ML classifiers. Oversampling technique was applied to randomly replicate the agonists’ data samples⁴¹. In this study, we employed the compound structural information as data features, represented by the Count-based Morgan Fingerprint (CMF) in radius 2, which represents each chemical structure as molecular fragments by iteratively obtaining distinct paths through each atom of the molecule⁴². These fragments were hashed into a Morgan Fingerprint bit (MFB) vector. The CMF counts the occurrences of MFB, forming an integer vector instead of a binary vector. The CMFs were calculated using RDKit library (version 2020.03.2), and no bit collisions occurred were also confirmed manually in this study. Overall, 2048 MFBs were calculated. Since the size of the data sample used in this study was relatively small, we conducted feature selection (FS) to eliminate non-correlated features by calculating intercorrelations and regression coefficient values to the target variable⁴³. One of the features in every pair with the intercorrelation values over the threshold of 0.9 were omitted. We also ensembled four regression models, Least Absolute Shrinkage and Selection Operator (LASSO), Ridge Regression (RR), Stepwise LASSO, and Random Forest (RF) to omit low correlation features. The Observations with missing values and the feature with no variance were also eliminated. All the features were rescaled to standardize numerical inputs whose mean was zero, and standard deviation was one for the ML classification models. After the data pre-processing, we retrieved 14 features for the model training and evaluation. All the data pre-processing and feature selection were conducted using open source Python package scikit-learn (version 0.23.2), pandas (version 1.2.1), and numpy (version 1.19.5).

Model Construction and Validation

In this study, we employed eight widely used classification models, including linear and non-linear models, to predict the compound bioactivity against OR5A2 and compared their performances. The eight models were as follows: naïve bayes (NB), linear discriminant analysis (LDA), logistic regression (LR), random forest (RF), K-nearest neighbors (KNN), Gaussian process (GP), support vector machine (SVM), and XGBoost linear (XGBL). Leave-one-out cross validation (LOOCV) was applied to evaluate the model performances⁴⁴. LOOCV allows using the maximum amount of data samples to train the ML models and leave only one single instance as test set in every cross-validation fold. Since the learning algorithm can apply at least once for each instance, it solves the bias problem caused by small dataset when it is randomly split. Furthermore, the model performances were also evaluated from Accuracy, F1 score, and the area under the receiver operator curve (ROC_AUC) score. All the ML models were constructed using open source Python package scikit-learn (version 0.23.2). The hyperparameter setting for all the models were set as default in the scikit-learn package (version 0.23.2).

Data Availability Statement

All data discussed in the paper are available in the manuscript or Supplemental information

Berger, R. G. Scent and Chemistry. The Molecular World of Odors. By Günther Ohloff, Wilhelm Pickenhagen and Philip Kraft. Angewandte Chemie International Edition 51, (2012).
Armanino, N. et al. What’s Hot, What’s Not: The Trends of the Past 20 Years in the Chemistry of Odorants. Angewandte Chemie - International Edition vol. 59 (2020).
Rossiter, K. J. Structure-odor relationships. Chemical Reviews 96, (1996).
Gautschi, M., Bajgrowicz, J. A. & Kraft, P. Fragrance chemistry - Milestones and perspectives. Chimia (Aarau) 55, (2001).
Kraft, P. & Fráter, G. Enantioselectivity of the musk odor sensation. in Chirality vol. 13 (2001).
David, O. R. P. A Chemical History of Polycyclic Musks. Chemistry - A European Journal vol. 26 (2020).
Niimura, Y., Matsui, A. & Touhara, K. Extreme expansion of the olfactory receptor gene repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 13 placental mammals. Genome Research 24, (2014).
Saito, H., Chi, Q., Zhuang, H., Matsunami, H. & Mainland, J. D. Odor coding by a mammalian receptor repertoire. Science Signaling 2, (2009).
Mainland, J. D., Li, Y. R., Zhou, T., Liu, L. L. W. & Matsunami, H. Human olfactory receptor responses to odorants. Sci Data 2, (2015).
Saito, N. et al. Involvement of the olfactory system in the induction of anti-fatigue effects by odorants. PLoS ONE 13, e0195263 (2018).
Sato-Akuhara, N. et al. Ligand specificity and evolution of mammalian musk odor receptors: Effect of single receptor deletion on odor detection. Journal of Neuroscience 36, (2016).
Yasi, E. A. et al. Rapid Deorphanization of Human Olfactory Receptors in Yeast. Biochemistry 58, (2019).
Trimmer, C. et al. Genetic variation across the human olfactory receptor repertoire alters odor perception. Proc Natl Acad Sci U S A 116, (2019).
Gonzalez-Kristeller, D. C., do Nascimento, J. B. P., Galante, P. A. F. & Malnic, B. Identification of agonists for a group of human odorant receptors. Frontiers in Pharmacology 6, (2015).
Ijichi, C. et al. Odorant metabolism of the olfactory cleft mucus in idiopathic olfactory impairment patients and healthy volunteers. International Forum of Allergy and Rhinology 12, (2022).
Chéret, J. et al. Olfactory receptor OR2AT4 regulates human hair growth. Nature Communications 9, (2018).
Lu, M., Echeverri, F. & Moyer, B. D. Endoplasmic reticulum retention, degradation, and aggregation of olfactory G-protein coupled receptors. Traffic 4, (2003).
Shirasu, M. et al. Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron 81, (2014).
Ahmed, L. et al. Molecular mechanism of activation of human musk receptors OR5AN1 and OR1A1 by (R)-muscone and diverse other musk-smelling compounds. Proc Natl Acad Sci U S A 115, (2018).
Block, E. et al. Implausibility of the vibrational theory of olfaction. Proc Natl Acad Sci U S A 112, (2015).
Li, B. et al. From musk to body odor: Decoding olfaction through genetic variation. PLoS Genetics 18, (2022).
Adipietro, K. A. Mainland, J. D. & Matsunami, H. Functional evolution of mammalian odorant receptors. PLoS Genet. 8:e1002821 (2012)
Jaeger, S. R. et al. A mendelian trait for olfactory sensitivity affects odor experience and food selection. Current Biology 23, (2013).
McRae, J. F. et al. Identification of regions associated with variation in sensitivity to food-related odors in the human genome. Current Biology 23, (2013).
Mainland, J. D. et al. The missense of smell: Functional variability in the human odorant receptor repertoire. Nature Neuroscience 17, (2014).
Triller, A. et al. Odorant - Receptor interactions and odor percept: A chemical perspective. Chemistry and Biodiversity 5, (2008).
Whissell-Buechy, D. & Amoore, J. E. Odour-blindness to musk: Simple recessive inheritance. Nature 242, (1973).
Huysseune S, Veithen A & Quesnel Y. Olfactory Receptor Involved in the Perception of Musk Fragrance and the Use Thereof. (2019).
Yoshikawa, K. & Saito Naoko. Method for selecting odor-controlling substance. (2016).
Available online: https://doi.org/http://www.thegoodscentscompany.com/ (accessed 31 Jan 2022). The Good Scents Company. (2022).
Surburg, H. & Panten, J. Preparation, Properties and Uses. in Common Fragrance and Flavor Materials (2016). doi:10.1002/9783527693153.
Chen, Y. et al. Dynamic accumulation of sesquiterpenes in essential oil of Pogostemon cablin. Revista Brasileira de Farmacognosia 24, (2014).
Yadav, J. S., Baishya, G. & Dash, U. Synthesis of (+)-amberketal and its analog from l-abietic acid. Tetrahedron 63, (2007).
Available online: https://pyrfume.org/ (accessed 31 Jan 2022). The Pyrfume Project.
Jabeen, A. & Ranganathan, S. Applications of machine learning in GPCR bioactive ligand discovery. Current Opinion in Structural Biology vol. 55 (2019).
Jabeen, A., de March, C. A., Matsunami, H. & Ranganathan, S. Machine learning assisted approach for finding novel high activity agonists of human ectopic olfactory receptors. International Journal of Molecular Sciences 22, (2021).
Bushdid, C., de March, C. A., Fiorucci, S., Matsunami, H. & Golebiowski, J. Agonists of G-Protein-Coupled Odorant Receptors Are Predicted from Chemical Features. Journal of Physical Chemistry Letters 9, (2018).
Rodríguez-Pérez, R. & Bajorath, J. Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. Journal of Medicinal Chemistry 63, (2020).
Kurian, S. M. et al. Odor coding in the mammalian olfactory epithelium. Cell and Tissue Research vol. 383 (2021).
Ieki, T., Yamanaka, Y. & Yoshikawa, K. Functional analysis of human olfactory receptors with a high basal activity using LNCaP cell line. PLoS ONE 17, e0267356 (2022).
Liu, A., G. J., M. C. Generative Oversampling for Mining Imbalanced Datasets. International Conference of Data mining (2007).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling 50, (2010).
Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Computers and Electrical Engineering 40, (2014).
Wong, T. T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognition 48, (2015).

Supplemental Tables S1-S6 and the supplemental text files are not available with this version.

Primarysupplemental0725.pdf

Download PDF

Version 1

posted

You are reading this latest preprint version

Diverse yet selective tuning of an odorant receptor for sensing four classes of musk compounds

Status:

Version 1

Abstract

Figures

Results And Discussions

Conclusion

Declarations

Materials And Methods

References

Supplemental Information

Supplementary Files

Status:

Version 1