Association of B cell profile and receptor repertoire with the progression of Alzheimer’s disease

Alzheimer’s disease (AD) is the most common type of dementia. Numerous reports have revealed that peripheral immune systems are linked to neuropathology; however, little is known about the contribution of B lymphocytes in AD. For this longitudinal study, one hundred-thirty-three participants were included in both baseline and follow-up 2 nd year. Next, we performed next-generation sequencing (NGS)-based B cell receptor (BCR) repertoire profiling followed by pair-wise overlap analysis. The longitudinal change in the B lymphocytes population was associated with increased cerebral amyloid deposition. Furthermore, patients with AD shared highly similar class-switched BCR sequences with identical isotypes despite the high somatic hypermutation rate of BCR sequences. These commonalities of BCR repertoires of the patients with AD show the possibility of immunological stimuli by amyloid precursor protein (APP). Thus, we provide evidence for both quantitative and qualitative changes in B lymphocytes during AD pathogenesis. The genetic information of patients from the NGS-based BCR repertoire profiling can lead to the development of immune-based therapeutics and treatments for AD.


Introduction
Recent researches have shown that Alzheimer's disease (AD) phenotypes are no longer limited to the changes in the brain. There are increasing shreds of evidence that the peripheral immune systems are closely linked to AD pathology, especially critical roles of immune cells in the pathogeneses of AD 1, 2 . So far, however, there has been little discussion about the contribution of blood lymphocytes in AD. Only several papers have revealed the possible association of blood lymphocytes with the decline in cognitive function or accumulation of cerebral amyloid deposition in AD, but these studies have been conducted as a cross-sectional study using lymphocyte population data or failed to correlate specific lymphocyte characteristics with the pathology of AD 3,4,5 . Besides, although the latest technologies such as B cell receptor (BCR) or T cell receptor (TCR) next-generation sequencing (NGS) for their repertoire profiling have been developed 6,7,8 , few cases have been applied to the study for AD 9 . Especially, no single study exists related to BCR repertoire analysis in AD.
Therefore, this study aims to identify which type of lymphocytes is directly linked with brain amyloid deposition and discover characteristics of BCR repertoires shared among the patients with AD. We found that the population change in B lymphocyte over 2 years was significantly associated with the increase in cerebral Aβ deposition. Furthermore, NGS for B cell receptor followed by pair-wise overlap analysis revealed that the commonalities of BCR repertoires within AD patients but not within normal participants. We also suggest the possibility that the overlapped BCR sequences might be related to the immunological reaction to amyloid precursor protein (APP). This novel approach through both the longitudinal analysis and BCR repertoire profiling to AD provides a perspective of understanding the link between peripheral immune systems and neuropathology in the brain of AD patients.

Results
The longitudinal study revealed B lymphocyte population is associated with brain amyloid deposition in AD Fig. 1A shows the overall strategy for flow cytometry analysis (Supplementary Fig. 1) followed by the longitudinal analysis for B lymphocyte population changes. For longitudinal analyses, we sorted participants into three groups: PiB decrease group (PiB Dec ), PiB stable group (PiB Stable ), PiB increase group (PiB Inc ) based on the changes in PiB-PET SUVR for 2 years (Fig. 1A, Table 1). Stepwise multiple regression analysis showed that the population change in B lymphocyte over 2 years (ΔB) was significantly associated with the increase in brain Aβ (change of PiB standardized uptake value ratio; ΔPiB), but the changes in other lymphocytes (ΔT, ΔNK) were not (Fig. 1B). That was the decisive reason for us to start focusing on the profiling of B cells. Interestingly, PiB Inc had a higher portion of participants with an increasing number of B lymphocytes (Fig. 1C), regardless of the baseline status of Aβ accumulation (Fig. S2). Furthermore, partial correlation and ANOVA post-hoc analysis showed that ΔB was positively correlated with ΔPiB ( Fig. 2A, left) and ΔB quartile four (Q4) had the highest ΔPiB value (Q1<Q2<Q3<Q4; Fig. 2A, right). Moreover, PiB Inc had higher ΔB than PiB Dec and PiB Stable (Fig. 2B). Additional ROC curve analysis showed that ΔB significantly increases the discrimination power for the PiB increase as a variable for the logistic regression analysis (Fig. 2C). The combination of ApoE and ΔB variable had higher sensitivity (70.6%) and specificity (72.7%) with a 0.754 area under curve (AUC) than ApoE alone (0.710 AUC with 55.9% sensitivity and 76.8% specificity) (Fig. 2D). To do further validation, we classified participants depending on the ΔB quartiles (BQ; BQ1<BQ2<BQ3<BQ4), and compare their ΔPiB. As a result, BQ4 and BQ2-3 groups had higher ΔPiB than BQ1 (Fig. 2E). These results led us to focus on B lymphocyte profiling for the following studies.

Pair-wised overlap analysis followed by NGS-based BCR repertoire analysis
For deep profiling of B lymphocytes in the BCR sequence level, we randomly selected representative 13 participants (3 Normal, 10 AD) with even distribution of apolipoprotein E allele frequencies (2 ɛ3/ɛ3 and 1 ɛ3/ɛ4 for Normal; 3 ɛ3/ɛ3, 4 ɛ3/ɛ4, 3 ɛ4/ɛ4 for AD), ages (73.6 ± 1.9 for Normal, 68.6 ± 2.2 for AD), and the number of total lymphocytes (30.7 ± 3.4 for Normal, 31.6 ± 2.6 for AD) that can interfere in the analysis for B lymphocyte 4, 10, 11 (Fig.   3A). The cerebral amyloid deposition (standardized uptake value ratio, SUVR), mini-mental state exam (MMSE) cognitive score, plasma t-tau, and beta-amyloid 42/40 ratio were significantly different between normal and AD, but not age (Fig. 3B-3C). Next, we generated BCR repertoire data of these samples. In the whole BCR repertoire level, we discovered no distinctive traits shared among the AD samples, probably due to the chronic characteristics of the diseases compared to non-chronic characteristics of diseases such as auto-immune or infectious diseases (Supplementary Fig. 3 to 6) 12 . Thus, to discover characteristics shared among the AD samples, we focused our analysis on a subset of the whole BCR repertoire overlapped among the samples. Due to the limited number of BCR sequences shared among more than two patients, we conducted the overlap analysis in a pairwise manner. Overlapped BCR sequences were categorized into two groups, Normal&X (comparison between Normal and Normal or Normal and AD) and AD&AD (comparison between AD and AD) (Fig. 3A,   3D left), according to overlap pattern. We hypothesized that overlap between Normal and AD has an identical pattern with that between Normal and Normal because the two groups of samples do not share common immunological stimuli. Thus, we defined the two comparisons collectively as Normal&X and compared it with AD&AD. Interestingly, AD&AD had more numbers of the overlapped BCR sequences and lineages compared to Normal&X (Fig. 3D, right). However, in divergence and isotype composition, the two groups showed no significant difference ( Fig. 3E-3F).

Pair-wised overlap analysis with highly similar overlapped BCR sequences revealed the commonalities of its repertoires within AD patients
To exclude coincidently defined as the overlapped BCR sequences and investigate overlapped BCR sequences in a stricter aspect, we calculated inter-sample full amino acid (AA) distance (shorter distance means higher similarity) of the overlapped BCR sequences 13 .
We defined the overlapped BCR sequences by VJ gene usage (related to the mechanism of somatic recombination) 14 and complementarity-determining region 3 (CDR3; the most variable portion of Ig molecules) 15 AA sequence homology (> 70%) as previously reported 13,16 . Also, due to the limited resolution of the used criteria, BCR sequences with distinctive somatic hypermutation (SHM; a process diversifying BCRs through point-mutations into Ig genes) patterns 17 were included in the overlapped BCR sequences. Thus, we quantified the degree of SHM sharing by defining the inter-sample full AA distance as the Hamming distance calculated in full AA level between the overlapped BCR sequences from the different samples 18 . AD&AD showed a shorter inter-sample full AA distance compared to Normal&X (Fig. 4A, left).
Then, the sub-population of the overlapped BCR sequences with small inter-sample full AA distance (< 5) was extracted and defined as highly similar overlapped BCR sequences. These highly similar overlapped BCR sequences can be generated in two ways. First, for naïve B lymphocytes, gene-rearrangement could generate coincidentally similar CDR3 with some bias inherent in the process. These sequences would have a very small number of SHMs and be in IgM isotype 6,19 . Second, for activated B lymphocytes, induction of somatic mutations by antigens can produce converged BCR sequences 20,21 . In this case, the sequences have high divergence values and are class-switched to isotypes other than IgM. These highly similar overlapped BCR sequences of AD&AD showed distinctive characteristics in the aspect of divergence and isotype composition compared to those of Normal&X. Among the sequences, sequences with a high divergence despite a small inter-sample full AA distance were only detected in AD&AD, which means mutation induction in the same direction resulting in identical residues for binding occurred only in AD&AD (Fig. 4A, right). Also, the number of Ig counts were significantly increased in AD&AD group (Fig. 4B).
Furthermore, most of these sequences with high divergence values were class-switched to the isotypes other than IgM (Fig. 4C). A total of 579 sequences and 108 lineages was classswitched among the highly similar overlapped BCR sequences. Among 108 lineages, 92 lineages contained more than one class-switched BCR sequence in both repertoires with a distinctive VJ gene utilization compared to whole class-switched repertoires of the samples ( Fig. 4D and Supplementary Table 2-3). Strikingly, in 91 out of 92 lineages, BCR sequences were represented with identical isotypes in two different repertoires, which means it is feasible for these sequences to result in the same effector functions. In one lineage with different isotypes, the sequences emerged as IgA1 and IgA2, respectively (Fig. 4E). These identical characteristics of the highly similar and class-switched overlapped sequences were also reproduced in a phylogenetic analysis by confirming that the sequences from different samples were clustered into the same sub-family with identical isotypes (Supplementary Fig.   7). We concluded there is a possible response to the stimulation from the same antigen 22, 23 .
In other words, the commonalities of BCR repertoires in the patients with AD implied the possibility of immunological stimuli by common antigens.

Estimation of target proteins through in silico mapping analysis
To estimate the target proteins of the overlapped BCR sequences, we conducted in silico mapping of the overlapped BCR sequences to the BCR sequences, of which 3D structure and binding targets were defined. From the IMGT 3D structure database, we could confirm 39 unique BCR sequences and their 3D structure known to bind to AD-related proteins 24 (Fig.   5). Then, we mapped the 39 BCR sequences binding to the AD-related proteins to the overlapped BCR sequences identified in our analysis while increasing the allowed number of CDR3 amino acid mismatches during the mapping (Fig. 5A). For most of the AD-related protein-binding antibodies, which showed no significant match with the overlapped BCR sequences, there was no difference in mapping patterns between AD&AD and Normal&X.
However, AD&AD and Normal&X showed different mapping patterns for two APP-binding antibodies with the simple CDR3 amino acids sequence of GDY ( Fig. 5B-5C,   Supplementary Fig. 8-9). A total of 82 overlapped BCR sequences with the perfectly matched CDR3 amino acid sequences were detected in AD&AD compared to the case of Normal&X where seven BCR sequences were detected. Also, class-switched overlapped BCR sequences were only detected in AD&AD (40 out of the 82 overlapped BCR sequences, 48.78%). This implies the possibility that these overlapped BCR sequences having the CDR3 amino acid sequence of GDY could be related to the immunological reaction to amyloid precursor protein (APP). Fig. 6 shows a graphical summary of this study.

Discussion
This is the first investigation into the association of B cell and its receptor repertoire with AD.
We provide clear evidence for both quantitative and qualitative changes of B cell during the progression of AD pathology. Although there have been several reports suggests that the links between the population of B lymphocyte and pathology of AD or clinical dementia symptoms, these researches just relied on the cross-sectional study and the results were quite controversial. For example, one prospective cohort study including 105 participants showed no differences in the percentage of B lymphocytes per total PBMC in the patients with AD compared to healthy controls but identified an increased number of B lymphocytes producing autoantibodies against beta-amyloid in AD 25 . On the other hand, other cross-sectional studies found a decrease in B lymphocytes as well as T lymphocytes in the patients with AD 4, 26, 27 .
Even though this discrepancy has been questioned continuously, many studies entertain the concept that lymphocytic abnormality is accompanied by dementia, suggesting that the population of lymphocytes could be a biomarker for AD 28 . A recent study identified the specific intracellular signaling markers, PLCγ2 in stimulated PBMC subsets from AD patients through single cell quantification and machine learning model 29 , however, the relevance of specific cell type among PBMC to AD pathology has been still uncertain, especially B cell population. In the present study, our quantitative analyses revealed that the more increase in the number of B lymphocytes, the more brain Aβ deposition occurred. The difference from previous studies is that we investigated the direct relationship with the changes in brain beta-amyloid through our longitudinal analyses. We thought this approach could lead to additional findings that would be difficult to identify in the cross-sectional study, and we revealed the longitudinal association of B lymphocyte with the pathology of AD.
Through our longitudinal study, we realized that it is worth trying to closely monitor the characteristics of B lymphocytes themselves within AD patients. Furthermore, since many of the latest technologies such as unbiased next-generation TCR/BCR repertoire analysis or single-cell sequencing have been developed, we thought it would be interesting to apply these technologies to our cohorts. Thus, we performed NGS-based BCR repertoire analysis for the deep profiling of BCR repertoire, and consequently, our analyses for B lymphocytes in the BCR sequence level identified the commonalities of BCR repertoires within AD patients implying the possibility of immunological stimuli by the common antigens such as plasma beta-amyloid or tau proteins 22, 23 . Moreover, we found the possibility that the overlapped BCR sequences matched with the CDR3 amino acid sequence of GDY could be related to the immunological reaction to APP, which is known as the precursor molecule of beta-amyloid.
Our work has some limitations and suggestions for future study. First, the short length of CDR3 and the limited diversity the region could contribute to the prevalence of these BCR sequences. Stringent experimental methods have to be followed to prove the relevance of the mapped BCR sequences with the disease. Second, future works should concentrate to find out the cause of this phenomenon. The other concern is that we performed the BCR repertoire analysis with only 13 participants. A validation study with a larger sample size is needed to confirm these data. Alternatively, further research through the isolated human B lymphocyte culture or induced-pluripotent cell (iPSC) culture could help us find specific antigens stimulating B lymphocytes.
Our approach through both the longitudinal analysis and BCR repertoire profiling to AD suggest that the immunological response of peripheral B lymphocyte is clearly associated with neuropathology in the brain of AD patients. Furthermore, the genetic information of BCR obtained from this study using next-generation repertoire analysis techniques can lead to the development of various immune-based therapeutics and treatments for AD.

Recruitment of participants
Total 133 participants were included, and they are sorted into three groups for our longitudinal analysis: Pittsburgh compound B-positron emission tomography (PiB-PET; brain PET-imaging for the cerebral amyloid deposition) standardized uptake value ratio (SUVR) decrease group for 2 years (PiB Dec , n = 33), PiB stable group (PiB Stable , n = 66), and PiB increase group (PiB Inc , n = 34) ( Table 1). All of the subjects underwent brain imaging such as structural MRI and PiB-PET as well as psychological or clinical assessments according to the expert's guideline. The assessments corresponded to neuropsychological test

Blood sampling, isolation of PBMCs and their storage
All blood samples were collected in the morning, using K2 EDTA tubes (BD Vacutainer Systems, Plymouth, UK) after overnight fasting. The EDTA tubes were immediately centrifuged and plasma supernatants were removed. The remaining blood was mixed with PBS and transferred into Ficoll-Hypauqe solution for isolation of peripheral blood mononuclear cells (PBMC) using gradient through repeating centrifugation and PBS washing.
The collected PBMCs were immediately cryopreserved by using cell freezing buffer (90% fetal bovine serum + 10% DMSO) and stored for one week in the Mr.Frosty Freezing Container (Thermo Scientific; Waltham, MA, USA). After a week, the vials moved to the LN2 tank and stored at -192°C. The detailed information of used reagents for this study was summarized in Table S1.

Next-Generation Sequencing (NGS) library preparation
Cryopreserved CD19 + B lymphocytes were thawed and total RNA was extracted using the TRIzol Plus RNA Purification Kit (Life Technologies). 1 μg of total RNA was used as input for library preparation. Reverse transcription was performed according to the manufacturer's instructions using SuperScript Ⅳ reverse transcriptase (Life Technologies) and primers for five immunoglobulin heavy chain isotypes containing UMI (unique molecular identifiers) barcode consists of 14 random nt and partial Illumina adapters (Supplementary Table 2).
Primer annealing was carried out at 72°C for 3 min then immediately placed on ice for 2 min.
The first-strand cDNA was purified using AmPure XP beads (Beckman Coulter) at a 1:1.8 ratio, then second-strand cDNA was synthesized using KAPA HiFi HotStart DNA polymerase (Kappa Bioscience) and a pool of 6 immunoglobulin heavy chain variable regionspecific primers (Supplementary Table 3 submitted for a quality control procedure on TapeStation 2200 (Agilent Technologies).
Libraries with a single peak of correct sequence length were subjected to NGS analysis using the MiSeq platforms (Illumina, Inc.).

Consensus sequence extraction from raw NGS reads
The raw NGS reads were q-filtered by the q20p95 condition, which means 95% nucleotides of the reads have to have higher Phred scores than 20. On average, 83.51% of the reads were passed the q-filtering condition. Then, the primer regions used in the experiments were extracted from the reads while allowing one mismatch (substitution, insertion, or deletion).
Based on the location of the primer sequences, UMI sequences were extracted and the reads were clustered according to the UMI sequences. The reads in the same UMI clusters were aligned using a multiple sequence alignment tool, the Clustal Omega 1.2.4 39, 40 . By using the nucleotides frequency information of the alignment results, the consensus sequence of the UMI clusters was extracted, then the read-count of the sequences was re-defined as the number of unique UMI sequences.

Isotype annotation and functionality filtering of the consensus sequences
The constant region of the consensus sequences was used for isotype annotation. The constant region sequences were recognized in a location-based manner, then the sequences were aligned to an in-house constant gene database constructed based on the IMGT (the international immunogenetics information system) database 41 . The isotype of the consensus sequences was determined by using the alignment results. Then, the V/D/J genes and CDR1/2/3 regions of the sequences were extracted and annotated by the IgBLAST 1.8.0 42 .
After the annotation, non-functional consensus sequences were filtered-out by the following condition. 1. Shorter sequence length than 250bp, 2. Existence of stop-codon or frame-shift in full AA sequences, 3. Annotation failure in more or equal to one of the CDR1/2/3 regions.

Divergence from germline V gene segments calculation
As the first step of the divergence calculation, the BCR sequences were aligned to germline variable gene segments by using IgBLAST 42 . Based on the IgBLAST alignment results, the aligned length of the sequences to the gene segments and the number of mutations within the aligned region were calculated. We defined the divergence value as the number of mutations divided by the aligned length as reported previously 6

Pair-wise overlap analysis and inter-sample full AA distance calculation
Pair-wise overlap analysis was conducted at the BCR lineage level. To define BCR lineages, a previously reported hierarchical clustering method was used 18 . BCR lineages satisfying the following conditions from two different repertoire data were merged and defined as overlapped BCR lineages. 1. Same V/J gene usage, 2. Same CDR3 sequence length, 3. CDR3 AA sequence homology more than 0.7. In the above condition, CDR3 AA sequence homology was set as one minus CDR3 AA sequence discrepancy, which was calculated by minimum Hamming distance in CDR3 AA sequence between all the combinations of BCR sequences from two different BCR lineages divided by CDR3 AA sequence length. BCR sequences of the overlapped BCR lineages were defined as overlapped BCR sequences. To define inter-sample full AA distance, BCR sequences in the same overlapped BCR lineages were used. First, the full AA sequence of BCR sequences in the same overlapped BCR lineages was trimmed and aligned. Then, Hamming distance in the full AA sequence between all the combinations of BCR sequences from two different samples was calculated. Among the calculated Hamming distance values, the minimum value was set as an inter-sample full AA distance value of the overlapped BCR lineage.

In silico mapping of the overlapped BCR sequences to antibodies binding to AD-related proteins
First, we defined VH sequences of the antibodies binding to AD-related proteins using the IMGT 3D structure database, which led to 39 unique VH sequences 43 . Then, we mapped the identified VH sequences to the overlapped BCR sequences of our analysis based on the identity of CDR3 amino acid sequences. The overlapped BCR sequences having the same CDR3 length as that of the VH sequence of antibodies binding to AD-related proteins were identified, and CDR3 mismatch was defined as the Hamming distance between their CDR3 amino acid sequences. The CDR3 amino acid mismatch threshold was defined as the number

Statistical analyses
All statistical analyses were performed by GraphPad Prism 8 (San Diego, CA, USA) or Medcalc Software (Acacialaan, Ostend, Belgium). Categorical data were analyzed using the chi-square test. Independent t-test and ANOVA post-hoc test were used for the comparison of numerical data. The outliers were excluded by the Grubb's test. The association between the values was determined by partial correlation analyses, with the correction for the several covariates. Multiple regression analysis was also performed. To test the discrimination power of variables, logistic regression followed by receiver operating characteristic (ROC) curve analysis was conducted. For comparison of numerical values used in the pair-wise overlap analysis, the Wilcoxon rank-sum test was conducted after the Shapiro Wilk test.

Ethical approval
This study was approved under the recommendations of the Institutional Review Board (IRB) of the Seoul National University Hospital (SNUH), South Korea. Written informed consent was applied to all the participants, corresponding to the Declaration of Helsinki. Protocols and manuals were also approved by the IRB of the SNUH.

Reporting Summary
Available soon.

Data availability
The sequencing data from this study are available at NCBI under SRA accession number PRJNA667860 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA667860). Also, all other data are available upon reasonable request.

Completing interests
The authors declare no competing interests.

Additional information
Supplementary information is available for this paper at online.

years). (B)
Step-wise multiple regression analysis between the delta (Δ) lymphocytes (B, T, NK lymphocytes) and Δ global PiB deposition (standardized uptake value ratio; SUVR). Age and gender were also included as covariates. Only B lymphocyte is selected by the stepwise method. (C) B lymphocyte changes between baseline and follow-up 2 nd year. PiB Inc group has higher portion of B cell increased-participants than PiB Dec and PiB Stable group (Chi-trend test, ## p < 0.01; Chi-squared test, **p < 0.01).   distance of the overlapped BCR sequences. The averaged hamming distance between BCR sequences, which were overlapped and derived from different samples, was calculated using full AA sequences. This distance was named as inter-sample full AA distance. Among the overlapped BCR sequences, sequences with inter-sample full AA distance value smaller than 5 were defined as 'highly similar overlapped BCR sequence'. Significantly increased divergence of highly similar overlapped BCR sequences in AD&AD group. ****p < 0.0001; Wilcoxon rank-sum test. (B) The number of Ig counts between Normal&X vs AD&AD. ****p < 0.0001; Man-Whitney test. (C) Isotype composition of the highly similar overlapped BCR sequences. Despite of a small inter-sample full AA distance, the sequences with a high divergence were only shown in AD&AD. (D) VJ gene usage of the class-switched and highly similar overlapped BCR sequences. (E) Isotype identity of the class-switched and highly similar overlapped BCR sequences. Matches in isotypes in different samples of the classswitched and highly similar overlapped BCR sequences were quantified. If the isotypes of sequences were identical in different two samples, they were counted as "identical". And, if not, they were counted as "different". The isotype of the sequences was also represented by colors following the color coding used in (C). The information of APP-specific VH sequences from the IMGT database. Figure 6. A graphical summary of this study From the quantitative and longitudinal analysis, we found that the population change in B lymphocyte over two years is associated with the increase in cerebral beta-amyloid deposition in AD. Furthermore, from the qualitative BCR repertoire analysis and in-silico mapping results, we identified that the commonalities of BCR repertoires within AD patients (Ig counts, Ig isotype composition, divergence) and suggested the possibility of the immunological reaction to amyloid precursor protein (APP). NGS, next-generation sequencing; Ab, antibody; Aβ, beta-amyloid; APP, amyloid precursor protein; Ig, immunoglobulin.  Alzheimer's disease 4 Park et al.