Harmonization of PD-L1 Immunohistochemistry and mRNA Expression Scoring in Metastatic Melanoma: A Multicenter Analysis

Background: Melanoma is a type of cancer with robust response to immunotherapy. Programmed death ligand 1 (PD-L1) testing is not required to treat patients, but most studies have demonstrated correlations between PDL1 expression and treatment response using various assays and scoring methods. This multicenter study aimed to harmonize PD-L1 immunohistochemistry (IHC) and scoring in melanoma. To provide a reference for PD-L1 expression independent of the IHC protocol, PD-L1 mRNA expression was determined via RNAscope, then compared to IHC. Methods: Standardized PD-L1 assays (22C3, 28–8, SP142, and SP263) and laboratory-developed tests (clone QR1 and 22C3) were evaluated on three IHC platforms using a training set of 7 cases. mRNA expression was determined via RNAscope (CD274/PD-L1 probe) and analyzed by image analysis. PD-L1 IHC ndings were scored by seven blinded pathologists using the tumor proportion score (TPS), combined positive score (CPS), and MELscore. After the study, a standardized method was proposed; this was validated by three blinded pathologists on a set of 40 metastatic melanomas that were stained using three protocols. Results: Concordances among various antibody/platforms were high across antibodies (e.g., intraclass correlation coecient [ICC] > 0.80 for CPS), except for SP142. Two levels of immunostaining intensities were observed: high (QR1 and SP263) and low (28–8, 22C3, and SP142). Reproducibilities across pathologists were higher for QR1 and SP263 (ICC ≥ 0.87 and ≥ 0.85 for TPS and CPS, respectively). QR1, SP263, and 28-8 showed the highest concordance with mRNA expression (ICC ≥ 0.81 for CPS). Accordingly, we proposed a standardized method for PD-L1 immunodetection and scoring, then tested it on 40 metastatic melanomas. This method included analysis of specimen quality (e.g., host-tumor interface


Background
Melanoma is a cancer with one of the highest response rates to immune checkpoint inhibitor treatment, particularly because of its extensive neomutations. Immunotherapy is used to treat advanced-stage melanomas (metastatic melanomas); recently, adjuvant and neoadjuvant treatments have proven e cacious in the treatment of stage IIb and III melanoma (1). Several studies have investigated factors that predict response to immunotherapy in patients with different types of cancers (2)(3)(4)(5). Among these factors, programmed cell death-1 receptor ligand (PD-L1) protein is the most commonly recognized biomarker in cancers (e.g., cutaneous melanomas)(6).
Importantly, a single factor cannot su ciently predict the clinical bene ts of immunotherapy; thus, there is a need for multiple biomarkers, such as tumor-immune microenvironment (7)(8)(9)(10)(11), tumor mutation burden (12)(13)(14), oncogenic changes (7), and clinicopathological characteristics of melanoma. The e cacies of immune checkpoint inhibitors in melanomas have been demonstrated in PD-L1-positive patients (15)(16)(17) based on various assays and scoring methods (1% or 5% (18,19)) of tumoral cells [TCs], as well as PD-L1 expression in TCs and immune cells [ICs] (19,20). No signi cant differences were observed in overall response rates between PD-L1-positive and -negative melanomas treated with ipilimumab (anti-CTLA4 molecule), compared to pembrolizumab, nivolumab, or combination therapy (5). Because immune checkpoint inhibitor treatment also bene ts PD-L1-negative patients, PD-L1 testing is not required, although it was proposed by the Food and Drug Administration as a complementary test to indicate the probability of bene t from anti-PD-1/L1 agents (3). Notably, for various types and subtypes of cancers, different immunohistochemistry (IHC) assays (i.e., different primary antibodies and assay conditions) and different PD-L1 IHC evaluation methods (i.e., different scoring methods and PD-L1 positivity cut-offs) complicate the comparisons among clinical studies; they also complicate the daily work of pathologists. Currently, the main challenge for pathologists is to determine the most reproducible IHC assay, particularly for laboratory-developed test (LDT) antibodies, for each pair (antibody/platform) applicable to all cancer types. A standardized and reproducible method for PD-L1 scoring based on the cancer subtype can then be used.
In this multicenter study involving seven pathology departments, coordinated by the French Society of Pathology, we attempted to harmonize immunostaining methods and compare various scoring systems to identify an easy-to-use, reproducible method for PD-L1 assessment in metastatic melanomas. We used a training set of seven metastatic melanomas and a validation set of 40 metastatic melanomas. Moreover, to provide a reference for PD-L1 expression independent of the IHC protocol, PD-L1 mRNA expression was determined via RNAscope, then compared to IHC.

Cases
Training/standardization set: We used sections of normal tonsil tissue and seven metastatic melanomas in formalin-xed para n-embedded blocks, which exhibited a range of PD-L1 staining expression results (15). These were de ned as follows: lymph node metastases with 100% positive staining in TCs; primary melanomas with low positive staining in TCs or ICs; primary melanomas with moderate positive staining only at the host-tumor interface (TCs or ICs); lymph node metastases with high positive staining throughout the tumor (TCs or ICs); negative subcutaneous metastases; negative primary melanomas; and hyperpigmented subcutaneous melanoma metastases with few positive TCs or ICs (< 1%). Immunostaining and RNAscope assays were performed on 10 consecutive tissue sections for each case.
A validation set of 40 metastatic melanomas was included 10 primary cutaneous melanomas, 18 lymph node metastases, 11 cutaneous metastases, and 1 lung metastasis.
Data were anonymized and protected during the study. For each case, clinical data at diagnosis and follow-up were collected. Consent for the use of the information and tissue samples was obtained in accordance with the guidelines of French Bioethics Laws for retrospective, non-interventional, research studies; this study was approved by the local ethical advisory board (available in supplementary data).
For the validation set of 40 cases, we selected one protocol on each platform based on the best reproducibility (QR1 Leica Bond, SP263 Ventana, and 28-8 Dako).
Pathological assessment of PD-L1 staining and scoring For the standardization/training set, 56 PD-L1 stained slides were analyzed by seven pathologists (JA, LL, MB, NO, LD, AF, and BV). The pathologists were blinded to the antibodies and platforms used. Pathologists used the main scoring scales from melanoma clinical trials: TPS (counting only PD-L1-positive TCs), MELscore (19), and CPS(22) (combined score that counts positive staining in TCs and ICs). The MELscore system is divided into six classes: 0, no membrane staining; 1, 0 < 1%; 2, 1 < 10%; 3, 10 < 33%; 4, 33 < 66%; and 5, ≥ 66%. We also considered the MELscore in three classes (no or < 1%; 1-9%; and ≥ 10% staining) because of better responses to immunotherapy in patients who exhibited > 10% cells with positive staining (19). CPS and MELscores were evaluated throughout the tumor and at the host-tumor interface. The host-tumor interface (i.e., invasive margin) was de ned as the region extending 100 µm on both sides of the contact zone between tumor cells and the surrounding non-tumor tissue ( Figure 1). Combined scores were assessed speci cally in this area because of the higher density of immune in ltrates and cells expressing PD-L1, compared with other tumor areas (9). For samples with heterogeneous PD-L1 expression, scoring was established on ve elds at high magni cation; the mean score was calculated.
For the validation set, 120 slides were analyzed (40 slides ⋅ 3 antibodies). Twenty cases immunostained with 3 antibodies were examined by three pathologists (LL, JA, and BV) using the method proposed at the end of training/standardization set. Forty cases that used QR1 immunostaining were analyzed by two pathologists (JA and BV). RNAscope mRNA expression was determined using the RNAscope assay and CD274/PD-L1 probe (Advanced Cell Diagnostics, Hayward, CA, USA). All steps of RNAscope staining were performed on a Ventana Discovery Ultra automation device using custom software (Ventana Medical Systems, Inc., Oro Valley, AZ, USA), as previously described (23). For RNAscope, 10 areas were selected in 4 representative cases by a pathologist who was blinded to the pathological and clinical data. Figure 2 shows the PD-L1 RNAscope staining results for a highly positive case. Percentages of TC, IC, and their combination (TC + IC) were analyzed. A cell with at least one dot was regarded as a PD-L1 mRNA-positive cell. A magni cation of ⋅200 was used for scoring. Intensity was evaluated using Advanced Cell Diagnostics scores, in accordance with the RNAscope system scoring guidelines (24): 0 = no staining; 1 = 1-3 dots/TC; 2 = 4-10 dots/TC; 3 = more than 10 dots/TC, where < 10% had dot clusters; and 4 = more than 10 dots/TC, where > 10% had dot clusters. Parallel image analysis was also performed. Slides were digitized using the Panoramic 250 Flash II slide scanner (3DHISTECH, Budapest, Hungary) with ⋅40 objective (resolution = 0.12154 µm/pixel) and extended focus algorithm. For each case, one representative region of interest was manually de ned using De niens® Tissue Studio. For each case, nuclei, stained cells (at least 1 PD-L1 mRNA signal), and number of signals were quanti ed using De niens® Tissue Studio. This method allowed calculation of the percentage of stained cells, the mean number of PD-L1 signals per stained cells, and the H-score (product of the two previous scores).

Statistical analyses
Concordances among antibodies were assessed with the intraclass correlation coe cient (ICC; optimal if = 1) using the two-way mixed effect model-absolute agreement and 95% con dence intervals. This method was also used to assess concordances between mRNA levels and antibodies.
Inter-observer reproducibilities for TPS and CPS were assessed with the ICC (optimal if = 1) using the two-way mixed effect model-absolute agreement and 95% con dence intervals. We used the following scale: < 0.50: low concordance; 0.50-0.75: moderate concordance; 0.75-0.90: high concordance; and > 0.9: nearly perfect concordance.
Concordances among pathologists concerning the aforementioned PD-L1 antibody type for the MELscore were assessed using the Fleiss kappa coe cient, which determines the hazard-related agreement. A bootstrap method with 1000 resampling iterations was used to estimate the con dence intervals. For interpretation, we used the following scale: < 0: disagree; 0.0-0.20: very low concordance; 0.21-0.40: low concordance; 0.41-0.60: moderate concordance; 0.61-0.80: high concordance; and 0.81-1.00: nearly perfect concordance.
To compare the TPS/CPS with the MELscore results for the validation set of 40 metastatic melanomas, we de ned classes of quantitative variables and used the kappa coe cient.
For mRNA results, we used H-scores de ned as the percentage of stained cells (from 0 to 100) multiplied by either the Advanced Cell Diagnostics score for manual analysis or the mean number of signals per stained cell for automated analysis. Although the results of the calculations could exceed 100, a maximum score of 100 was used for comparisons with IHC results.
Correlation between the results for each antibody (0-100%) and PD-L1 mRNA score (percentage of staining × Advanced Cell Diagnostics score 0-4) were assessed. Each antibody was compared with the mRNA results. The results are presented with the rho correlation coe cient and the p-value.

PD-L1 immunostaining and concordance among antibodies
We observed two levels of immunostaining intensities across the various assays and LDTs (Figure 3). Antibody/platform combinations with high-intensity staining were QR1 Dako, QR1 Ventana, QR1 Bond, and SP263 Ventana; combinations with low-intensity staining were 28-8 Dako, 22C3 Bond, 22C3 Dako, and SP142 Ventana. For 22C3 and 28-8 assays on Dako platforms, we repeated the staining procedures in two different pathology departments (Toulouse and Bordeaux); the results were similar in each department. To study agreement between antibodies and platforms, we compared the median CPS scores (throughout the tumor and at the host-tumor interface) among seven pathologists. The ICC values for pairwise comparisons between antibodies for CPS throughout the tumor are shown in Table 1. The concordance of CPS for QR1 used in the different platforms was nearly perfect (ICC ≥ 0.98). The concordances between antibodies were high (ICC > 0.80) except for SP142 (particularly compared to QR1: moderate concordance). Similar results were found for TPS and MELscore in three classes. Reproducibility among pathologists (Table 2) Reproducibilities of TPS and CPS (throughtout the tumor) were higher for the four high-intensity staining procedures (ICC ≥ 0.87 and ≥ 0.85, respectively) than for the low-intensity staining procedures (ICC ≤ 0.81 and ≤ 0.79, respectively). The SP142 assay had an ICC score of 0.55, which was signi cantly lower than the ICC scores of other protocols. Analysis of CPS at the host-tumor interface was less consistent among pathologists, compared to CPS throughout the tumor (or TPS at any location) for most antibodies. However, 22C3 Dako and SP142 Ventana showed better reproducibilities among pathologists in this speci c area. We analyzed the reproducibilities of MELscores that were evaluated in six and three classes.
In summary, similar trends were observed among TPS, CPS, and MELscores that were evaluated throughout the tumor. Pathologists were more consistent when using antibodies with high-intensity immunostaining than when using antibodies with low-intensity immunostaining. Correlation between PD-L1 mRNA expression (RNAscope) and immunostaining Agreement between pathological assessment of PD-L1 mRNA RNAscope and immunostaining of the 10 representative areas (4 representative cases) are presented in supplementary data (data not shown). SP263, QR1, and 22C3 had high concordances with PD-L1 mRNA for TC staining (ICC ≥ 0.73), while 28-8 and SP142 had lower concordances (ICC ≤ 0.65). For IC staining, SP142 was closest to mRNA staining (ICC = 0.45). For TCs + ICs, antibodies with high-intensity immunostaining had better concordances than did antibodies with low immunostaining (ICC ≥ 0.61 and ≤ 0.54, respectively). These results were con rmed by comparing PD-L1 mRNA (on 6 cases analyzed by automated whole-slide reading) and CPS (Table 3). SP142 was less concordant with PD-L1 mRNA for CPS (ICC = 0.42). Proposed method for standardization of PD-L1 immunodetection and scoring in metastatic melanomas ( Table   4): analysis of method reproducibility in the validation set (40 cases) NB*: PD-L1 immunostaining sensitivity is lower when using red chromogen than when using brown chromogen.
o Host-tumor interface present on specimen? If not (e.g., small biopsy) a negative PD-L1 immunostaining cannot be a rmed. After the rst step of PD-L1 immunostaining and scoring comparison, we proposed a standardized method for PD-L1 assessment (Table 4 and Figure 4) that is suitable for all antibodies and platforms (except SP142); this could improve reproducibility among pathologists. A validation set of 20 cases was studied by three pathologists (LL, JA, and BV) with this methodology using the staining protocols with the highest inter-observer reproducibility on each platform (QR1 Leica Bond, SP263 Ventana, and 28-8 Dako). Forty cases that used QR1 immunostaining were analyzed by two pathologists (JA and BV). We observed pigmentation in 12 of the 40 cases, but only 4 of these were considered di cult to interpret. Host-tumor interface was absent in only 2 of the 40 cases.
The best concordance among pathologists was achieved for QR1 and SP263 (high-intensity protocols) (Table  5a). Such QR1 concordance among pathologists was con rmed on 40 cases (data not shown, presented in supplementary data). Concordance was higher for MELscore than for other scores (TPS and CPS: moderate) using kappa coe cient analysis (Table 5b).  The concordance between each pair "patient/pathologist" among the 3 antibodies was excellent for all criteria studied by pathologists (Table 6).

Discussion
Harmonization of PD-L1 immunostaining and scoring is important as an immunotherapy response biomarker; it is also important for studies concerning the effects of PD-L1 on melanoma prognosis (25,26). Variability among PD-L1 immunostaining intensities can be attributed to technical problems; such variability is reproducible when the same technique and platform are used by different laboratories (27)(28)(29). For companion tests such as 22C3 and 28-8 (companion tests for pembrolizumab and nivolumab), PD-L1 staining has been harmonized and validated (30). Therefore, the pathological assessment should be performed in accordance with the antibody used (low-or high-intensity staining). High-intensity immunostaining has the advantages of simple quanti cation and greater reproducibility among pathologists. To our knowledge, there have been no studies to investigate potential correlations between the response to immunotherapy and PD-L1 expression using such high-intensity antibodies (QR1, SP263) in melanomas. Most clinical studies concerning melanomas have used 22C3Dako and 28-8Dako, and positive staining has been found to predict better treatment response (18,19). Consequently, the exclusive use of high-intensity antibodies is not recommended in melanomas; the low-intensity 22C3 Dako and 28-8 Dako antibodies must be used for positive results. The use of SP142 is questionable because it was less reproducible in the present study; it is not a clinically proven predictor of response to immunotherapy. Therefore, we do not recommend its use for PD-L1 evaluation in melanomas.
Moreover, we found that PD-L1 mRNA expression (number of pink dots per cell representative of mRNA levels) for TCs + ICs had better concordance with high-intensity antibodies (QR1 and SP263). These results, independent of the IHC protocols, reinforce the need for studies concerning potential antibodies of interest. RNAscope has been used to study PD-L1 within formalin-xed para n-embedded cancer tissue samples; it has shown good concordance (31,32). Concordances of mRNA with different PD-L1 immunostaining protocols (and antibodies) have been demonstrated in lung cancer (27). In melanoma, concordances between IHC and RNA expression were reported with E1L3N and SP142 antibodies for IHC and the Nanostring Analysis system for mRNA (29). Post-translational protein modi cations and other factors can confer discordant results between protein expression and mRNA levels. It remains unclear whether mRNA expression can be a "gold standard" for assessing PD-L1 expression and response to immunotherapy, but this technique is more di cult and expensive than IHC.
Based on our results, we proposed a standardized method for PD-L1 immunodetection and scoring in melanomas (Table 4). First, it is important to analyze the quality of the specimen. Rarely, melanin pigmentation can impact PD-L1 melanoma scoring (33). In our validation set, only 4 cases (10%) were di cult to interpret because of pigmentation; such cases required the use of red chromogen. In our experience, PD-L1 immunostaining using red chromogen is less sensitive, compared to brown chromogen; this is consistent with previous ndings (33). Accordingly, we recommend exclusion of such pigmented components, if possible. It is important to identify the host-tumor interface in the specimen because of a higher density of immune in ltrates and PD-L1-expressing cells, compared with other tumor areas (9). If the host-tumor interface is absent (e.g., in a small biopsy), negative PD-L1 immunostaining cannot be con rmed. This information has not been previously speci ed in clinical immunotherapy studies concerning melanomas; it may explain the good response in some PD-L1-negative melanomas.
Second, we recommend analyzing PD-L1 throughout the tumor, regardless of whether melanocytes and ICs at the host-tumor interface show more intense positive staining for PD-L1 (9). Despite a precise de nition proposed by our consensus of seven pathologists, the CPS and MELscores at the host-tumor interface were less consistent than scores based on analyses of the entire tumor. Moreover, Dupuis et al. (15) reported that PD-L1 expression throughout the primary melanoma surface was associated with objective response rate (35.7% and 5% for PD-L1-positive and -negative staining, respectively; p = 0.02). The ndings were markedly different with respect to PD-L1 status evaluated at the host-tumor interface. Consequently, evaluation of PD-L1 status speci cally at the host-tumor interface appears less relevant than assessment throughout the tumor in melanoma.
Finally, the assessment of PD-L1-positive staining in melanomas should include both PD-L1 expression in TCs and combined analyses of TCs and ICs (using MELscore or CPS). Indeed, previous studies have reported better responses to immunotherapy for PD-L1-positive staining TCs (threshold of 1 or 5% depending on the studies(16, 18)) and for PD-L1-positive staining on both TCs and ICs (20,34). It is di cult to count only TCs (and not PD-L1-positive macrophages) at the host-tumor interface without double staining. It has been suggested that some PD-L1-positive tumor cells on PD-L1-CD163 and PD-L1-SOX10 double staining constitute PD-L1-negative tumor cells surrounded by PD-L1-positive histiocyte cytoplasmic extensions (15). Therefore, the use of a combined score to count TCs and ICs avoids this problem; it is also more reproducible among pathologists. The MELscore was initially classi ed into six classes. However Daud et al. (19) found that a MELscore ≥ 3 (> 10% positive TC or IC staining) was associated with the best overall response rate after pembrolizumab treatment, leading to the use of MELscore with three classes; this was more reproducible than six classes. Moreover, in our validation set of 40 cases, the concordance for MELscore with three classes was higher than for other scores (TPS, CPS: moderate). Notably, the study by Daud et al. is the only report comparing MELscore with response to immunotherapy (19).

Conclusion
This study proposes a standardized and reproducible method (Table 4) for PD-L1 immunodetection and scoring for melanoma patients. This reproducible method is available using the same PD-L1 IHC protocols as in other cancers; it can simplify daily practice. We recommend using QR1, SP263 or 28-8, and 22C3 with a combined score that analyzes the entire tumor surface. Di culties concerning melanomas (technique, choice of antibody and platform, scoring, TCs versus ICs, types of specimens, and presence of a host-tumor interface) may explain the contradictory results with respect to PD-L1 expression and its correlations with responses to immunotherapy (3,35). Other biomarkers are also promising; combinations of such markers are likely to guide treatment decisions (5). Presently, a PD-L1 companion test is not required for treatment of metastatic melanoma. However, expansion of immunotherapy indications could lead to more therapeutic options. If PD-L1 expression is expected to be used for treatment decision guidance concerning adjuvant or neoadjuvant use, a reliable and reproducible method (e.g., the approach presented in this study) should be used. The standardized method that we have proposed represents a rst step toward reproducible PD-L1 analyses, particularly involving the use of LDT antibodies because QR1 is most closely correlated with PD-L1 mRNA expression. Our method and its ability to predict responses to immunotherapy should be tested in larger prospective studies.