Patient recruitment
For the multiome analysis, we recruited patients undergoing radical nephro-ureterectomy or radical nephrectomy at the Western General Hospital, Edinburgh. We recruited patients with renal cell carcinoma (n = 6) or oncocytoma (n = 1) and normal renal function (eGFR > 75ml/min/1.73m2) as controls and in addition we recruited 5 patients with transitional cell carcinoma of the ureter leading to ureteric obstruction as determined by the presence of hydronephrosis on CT imaging.
For spatial transcriptomics, we utilised surplus formalin-fixed, paraffin-embedded tissue from kidney biopsies from patients with IgA nephropathy (n = 6) or minimal change disease (n = 3) that were performed for clinical purposes. In addition, we employed 4 wedge biopsies from nephrectomy specimens removed due to advanced pyelonephritis.
Renal function at the time of nephrectomy or biopsy and during follow-up was estimated from serum creatinine results provided by the clinical lab by applying the CKD-EPI equation without correction for race(46). Urinary protein:creatinine results were provided by the clinical lab.
Use of both the nephrectomy and biopsy samples was approved by the steering committee of the National Research Scotland Lothian Bioresource (REC 20/ES/0061, studies SR1651,SR1887).
Animal experiment
Male C57BL/6JCrl mice were purchased from Charles River Laboratories and housed in a pathogen-free environment at the University of Edinburgh animal facility. All procedures had prior approval of the Animal Ethics Committee, University of Edinburgh and were conducted in accordance with the United Kingdom Animals Scientific Procedures Act 1986 and the ARRIVE guidelines.
IRI surgery was performed as previously described(47). Briefly, a posterior flank incision was made and the left renal pedicle identified and clipped using an atraumatic clamp for 15 minutes. During the ischemic period, body temperature was maintained at 37°C using a heating blanket with homeostatic control (Harvard Apparatus) via a rectal temperature probe. The clamp was then removed, the peritoneum closed with 5/0 suture, and the skin closed with clips. In sham surgery, the mice underwent flank incision, however the renal pedicle was not clamped.
T-5224 was reconstituted in dimethyl sulfoxide (DMSO) and diluted 1:20 in sunflower oil. 3 days post-IRI mice in the respective group were given 10mg/kg T-5224 (APExBIO, B4664) by gavage daily for 12 days. Animals were maintained for 15 days post-IRI before tissue harvest. IRI or respective control kidneys were weighed, and one third was fixed for 24 hours in 10% neutral buffered formalin with the remainder snap-frozen in dry ice.
Histological analysis
After fixation and paraffin embedding, human and mouse tissues were sectioned at 4µm depth from FFPE embedded tissue blocks and adhered to Superfrost Plus slides (VWR, 631 − 0108). Sections were deparaffinized two times 5 minutes in xylene and rehydrated in 100%, 90% and 70% ethanol for 2 minutes each and placed in distilled water. PSR staining was performed using the Picro Sirius Red Stain Kit (Abcam, ab150681) following manufacturer guidelines. After rehydration of tissue sections, the tissue was covered picro sirius red solution for 60 min. Slides were rinsed in acetic acid solution and dehydrated in absolute ethanol before mounting. For macrophage quantification, murine kidney sections were incubated with the avidin/biotin blocking kit (Vector Laboratories, SP2001) and blocked with serum-free protein block (Dako, X0909) before incubation overnight at 4°C with IBA1 antibody (Invitrogen, PA5-27436) diluted 1:250 in antibody diluent (Dako UK Ltd, S202230). After washing, the sections were incubated with polyclonal goat anti-rabbit biotinylated secondary antibody (DAKO, E0432, 1:300 dilution) for 30 minutes at room temperature. Vectastain RTU ABC Reagent (Vector Laboratories, PK7100) was then applied, followed by incubation with the DAB1 Substrate Chromogen System (Dako, K3468), and then counterstaining with hematoxylin before dehydration and mounting with Pertex mounting medium (Histolab Products AB, 3808707E).
PSR stained sections were scanned with a Zeiss AxioScan Z1 slide scanner (Carl Zeiss). Images of human kidneys were reviewed by a consultant pathologist who confirmed the absence of tumour infiltration and provided a semi-quantitative assessment of glomerulosclerosis and tubulointerstitial inflammation and fibrosis. For mouse tissue, PSR staining of renal cortex was quantified in QuPath (v0.4.2). A threshold classifier was used to calculate cortex area with intensities above the thresholds denoting collagen fibres and this was assessed as a percentage of total tissue area. The percentage of total renal cortex that stained with IBA1 was quantified using a minimum threshold for positive staining on ImageJ analysis using a minimum of 5 x20-magnification images of renal cortex.
qPCR
Total RNA was extracted from kidney tissue using the miRNeasy Mini kit (Qiagen, 217004), following the kit instructions. cDNA for quantitative PCR was synthesised from the extracted RNA using the High-Capacity cDNA Reverse Transcription kit (Applied Biosystems, 4368814). Quantitative PCR was performed using TaqMan Universal Master Mix II (Applied Biosystems, 4440040) and TaqMan Gene Expression Assay-specific primers (Table S19) and normalised to Peptidylprolyl isomerase A (Ppia) expression.
DNA extraction and genotyping
DNA was extracted from approximately 25 mg of nephrectomy tissue using the DNeasy Blood & Tissue Kit (Qiagen, 69504) according to the manufacturer instructions. Briefly, tissue was cut into small pieces and incubated with proteinase K for 3 hours at 56°C. Genomic DNA was purified using a DNeasy Mini spin column, washed and eluted with TE buffer. 200 ng of genomic DNA per sample was used to genotype samples using an Infinium Global Screening Array (Illumina, 20030770) according to manufacturer instructions. Raw genotyping data was converted to variant call format (VCF) using GenomeStudio (Illumina, v2.0.3).
Single-nucleus multiome RNA and ATAC sequencing
Nuclei isolation, library preparation and sequencing
For single-nuclei multiome studies wedge biopsies were taken immediately after nephrectomy from the pole opposite the tumour in those with intra-renal tumours and samples snap-frozen in dry ice and a separate piece was fixed for 24 hours in 10% neutral buffered formalin before paraffin embedding for histological analysis. Nuclei were isolated from snap-frozen nephrectomy samples using an adapted 10x Genomics Demonstrated protocol (CG000375, Nuclei Isolation from Complex Tissues for Single Cell Multiome ATAC + Gene Expression Sequencing). Using a scalpel, the tissue was cut into approximately 25 mg of rice-grain sized pieces while avoiding thawing and all further steps were performed at 4°C. Initially, 150 µl lysis buffer containing 10 mM Tris-HCl (Sigma-Aldrich, T2194), 10 mM NaCl (Sigma-Aldrich, 59222C), 3 mM MgCl2 (Sigma-Aldrich, M1028), 0.05% NP40 Substitute (Sigma-Aldrich, 74385), 1 mM DTT (Sigma-Aldrich, 646563) and 1 U/µl Protector RNase inhibitor (Sigma-Aldrich, 3335402001) in Nuclease-free Water (Invitrogen, 10429224) was added and tissue was homogenised using a sterile pestle (Fisher Scientific, 12-141-368). Once homogenised another 850 µl lysis buffer was added, and tubes were placed on ice for 5 minutes. The suspension was strained using a 40 µl filter (Cambridge Bioscience, 43-10040-60), centrifuged (500 g, 5 min, 4°C) and 1 ml 2% BSA (Miltenyi Biotec,130-091-376) in phosphate-buffered saline (PBS) (ThermoFisher, 10010023) was added without disrupting the pellet. After 5 minutes nuclei were resuspended, centrifuged (500 g, 5 min, 4°C) and finally resuspended in 300 µl of 2% BSA in PBS.
Nuclei were stained with 10 µl of 7-AAD viability staining solution (Thermo Fisher Scientific, 00-6993-50) and sorted on a BD FACSAria fusion (Becton Dickinson) cell sorter fitted with a 100 ml nozzle into a tube prepared with 200 µl of 10% BSA in PBS supplemented with 50 µl RNase inhibitor for a post-sorting volume of 2 ml. Between 3 and 5 samples from individual donors were pooled at pre-defined proportions based on counts of 7-AAD + nuclei (Fig. S2B) and a total of 500,000 nuclei were sorted per pool (Fig. S2A).
Sorted nuclei were centrifuged (500 g, 5 min, 4°C) and resuspended in 100 µl lysis buffer with 0.01% NP40 Substitute, 0.01% Tween-20 (Bio-Rad, 1662404), 0.001% Digitonin (Thermo Fisher Scientific, BN2006) and 2% BSA. After 2 minutes wash buffer (10 mM Tris-HCl, 10 mM NaCl, 3 mM MgCl2, 2% BSA, 1 mM DTT, 1 U/µl RNase inhibitor, 0.1% Tween-20) was added and nuclei were centrifuged (500 g, 5 min, 4°C) and resuspended in 20 µl diluted nuclei buffer (20X Nuclei Buffer, 1 mM DTT, 1 U/µl RNase inhibitor in nuclease-free water).
Nuclei suspensions were counted using a LUNA-FX7 cell counter (Logos Biosystems) and processed immediately using the Chromium Next GEM Single Cell Multiome ATAC + Gene Expression kit (v1.0, 10x Genomics) according to manufacturer protocol (CG000338, User Guide Chromium Next GEM Single Cell Multiome ATAC + Gene Expression). Briefly, per library between 20,000 to 40,000 nuclei were input into the transposition reaction before partitioning into gel beads. After reverse transcription and pre-amplification, aliquots of the pool of transposed DNA and cDNA were used to generate separate ATAC and GEX libraries.
After quantification GEX and ATAC libraries were pooled at molarity ratio of 40:60 respectively and sequenced using an Illumina NovaSeq6000 flow cell (with the read lengths R1:151bp I1:10bp I2:24bp R2:151bp) for a target yield of approximately 50,000 reads/cell and 75,000 reads/cell for gene expression and ATAC libraries respectively. In total 6 libraries comprising a total of 7 individual control and 5 UUO individual samples were prepared and analysed.
Sequencing data processing and donor demultiplexing
BCL files from Illumina sequencing runs were demultiplexed for GEX and ATAC samples using cellranger-arc mkfastq (v2.0.2, 10x Genomics). The base mask Y28n*,I10n*,I10n*,Y151 and --filter-dual-index or Y100n*,I8n*,Y24,Y100n* and --filter-single-index was used to generate GEX or ATAC FASTQ files respectively. Reads were aligned to the precomputed GRCh38 (hg38) reference genome provided by 10x Genomics, and quantification as well as joint cell calling was performed using cellranger-arc count (v2.0.2).
Due to donor pooling, each library included cell barcodes from multiple different samples. To allocate barcodes to their original donor the cellsnp-lite (v1.2.2)(48) and vireo (v0.2.3)(49) packages were used. Barcodes passing cellranger-arc filters were genotyped independently for GEX and ATAC libraries by searching for SNPs present in sequencing reads using cellsnp-lite with the options --minMAF 0.1 (and --UMItag None for ATAC files) using a reference SNP list of 7.4 million SNPs from the 1000 Genomes Project, filtering for a minor allele frequency > 5% (provided by cellsnp-lite). To infer the original donor of each nuclei barcode using genetic variants that segregate between samples, a Bayesian model implemented by vireo was used (with option –N equal to the number of donors in the library). As donors were predicted independently for GEX and ATAC libraries, conflicts in donor assignments arising in a minority of nuclei barcodes were resolved using the following logic: i) barcodes assigned as doublet in either modality or unassigned in both modalities were removed (18.36% of total); ii) barcodes which were unassigned in one modality but not the other were assigned to the donor predicted by the successful modality (1.55% of total); iii) barcodes with conflicting donor assignments were removed (98.68% matching, 1.32% conflicting; % of remaining after previous filters) (Fig. S2H).
To link predicted donors with physical samples two parallel approaches were used. Firstly, using the original tissue, genomic DNA was extracted and genotyped using an Illumina an Infinium Global Screening Array as described above. VCF files containing the predicted genotype of each donor-library combination were loaded in R (v4.3.1) and subset to SNPs commonly detected in all libraries. To intersect the predicted genotypes with the SNP array, both sets of SNPs were further reduced to a common set and a mean distance matrix was calculated based on the Hamming distance at each genomic location (with 0 being a homozygous reference allele, 2 a homozygous alternative allele and 1 being a heterozygous allele) (Fig. S2J). Secondly, a subset of samples was pooled in multiple libraries creating a distinctive pooling pattern which can be used to identify each sample (i.e. the same sample pooled in two different libraries is expected to have the same genotype, as predicted by vireo) (Fig S2I). VCF files were processed as previously and the mean Hamming distance between each donor-library pair was calculated, showing a lower distance between identical sample used in multiple libraries (Fig. S2K).
Joint dimensionality reduction, clustering and cell annotation
While SNP-based donor demultiplexing allows identification of doublets with distinct genotypes, doublets of nuclei from the same donor remain undetected. Therefore, we used an iterative approach to identify undetected doublets, initially retaining SNP doublets to aid identification. Seurat (v4.4.0)(50) and Signac (v1.11.0)(51) were used to load GEX data using Read10X_h5() and a Seurat object was created using the CreateSeuratObject function without filters. The data was normalised using SCTransform with options vst.flavor = "v2" and principal components were computed using RunPCA with npcs = 100. Different dimensionality reduction and clustering setting were used to compute different representations of the data with RunUMAP, FindNeighbors (with dims between 1 and 80) and FindClusters (resolution between 0.5 and 4). At each iteration presumed doublet subclusters were marked based on the criteria: i) enrichment of doublets identified based on SNP demultiplexing (presence of SNPs from different genetic backgrounds); ii) increased number of unique molecular identifiers (UMIs) or unique genes per barcode; iii) co-expression of multiple markers from distinct cell lineages (e.g. PTPRC, FLT1, MME, SLC12A1). Inversely, all barcodes labelled as doublet were separately clustered and previously misclassified doublets were retained.
After initial doublet identification, the full GEX and ATAC datasets were loaded, and doublets or barcodes not assigned to a donor were removed. Further, any gene expressed by less than 20 nuclei barcodes was removed. Additionally, barcodes with i) unique molecular identifiers (UMIs) < 300 or > 12,000; ii) unique genes > 4,000; iii) mitochondrial gene content > 10%; iv) number of ATAC fragments < 700 or > 100,000; v) number of unique peaks < 300 or > 40,000; vi) a nucleosomal signal score (calculated with NuceosomeSignal) < 2; vii) a TSS enrichment score (calculated with TSSEnrichment) < 2 were removed (Fig. S2D-G, S3A,D).
Next, dimensionality reductions were independently computed based on GEX and ATAC datasets. In both cases datasets were split by the donor sex using SplitObject. Data was normalised using SCTransform with option vst.flavor = "v2" as previously and integration anchors were computed using FindIntegrationAnchors with the top 3,000 shared variable features. The data was integrated using IntegrateData with dims = 1:80 and the UMAP embedding was computed with dims = 1:80. Similarly, the ATAC data was processed using FindTopFeatures, RunTFIDF, RunSVD with n = 100 before splitting by donor sex, computing anchors using all peaks, and integrating data with IntegrateData with dims = 1:80 and RunUMAP with dims = 2:80). Finally, the Seurat weighted nearest neighbor (WNN) analysis was used to compute a joint dimensionality reduction using FindMultiModalNeighbors with options dims.list = list(1:100, 2:100) and k.nn = 60) in the PCA and LSI embeddings of the GEX and ATAC datasets respectively. The joint UMAP was computed using the WNN graph with RunUMAP option nn.name="weighted.nn" and barcodes were clustered using FindClusters with options resolution = 4 and graph.name = "wsnn".
In order to refine the peak set and detect peaks present in small cell populations clusters were tentatively annotated (pending the final annotations) and the Signac CallPeaks wrapper function for MACS2 (v2.2.7.1)(52) was used to call cluster specific peaks. Peak sets for each cluster were further merged and pruned by removed peaks in scaffold and blacklisted regions. Finally, a peak by nuclei barcode count matrix was generated using the FeatureMatrix function with the new peaks regions as features. Following this, ATAC and WNN dimensionality reductions were recomputed as before, and barcodes were re-clustered (Fig. S3B). To determine final cluster annotations barcodes were first broadly clustered using the FindClusters function with lower resolution settings. Broad cluster (level 1 annotations) were further iteratively subclustered by re-computing dimensionality reductions and sub-clusters within a larger cluster. Marker genes for each cluster were determined using the FindMarkers function and remaining subclusters co-expressing signatures of distinct cell types were removed as doublets. Each subcluster was manually annotated (level 2 annotations) according to gene markers determined by prior human kidney atlases(4) (Fig. S6A, Table S4). We defined injured epithelial clusters based on a proportional enrichment in UUO samples compared to control samples as well increased expression of generic injury genes such as PROM, DCDC2, SPP1, ITGB6, ITGB8 (Fig. 1C-1E, Table S5,8). Subclusters of PT and LOH nephron segments were identified as inflammatory cell states based on the expression of injury markers as well expression of chemokines, extracellular matrix remodelling and adhesion factors and markers of cell cycle arrest (Fig. 2C). Individual chemokines were identified in other injured nephron segments, however they lacked a consistent signature of multiple inflammatory markers and sufficient separation from injured counterparts (Fig. 2J). Agreement between GEX and chromatin profile of markers genes was assessed using the CoveragePlot function in genomic loci of marker genes determined by GEX (Fig. S4, S5).
All further basic graphs showing gene expression, chromatin state, and meta data features were generated using Seurat and Signac functions DimPlot, FeaturePlot, DotPlot, VlnPlot and CoveragePlot.
Integration with KPMP snCv3 dataset
The full (as of September 2023) The Kidney Precision Medicine Project (KPMP) snRNA-seq dataset along with associated clinical data were downloaded from the KPMP website (https://atlas.kpmp.org). Integration of the snCv3 dataset data was performed using Seurat with the multiome dataset as reference. Transfer anchors were computed with FindTransferAnchors using dims = 1:50 with the multiome pre-computed GEX principal components as reference. Subsequently, the snCv3 data along with prediction of respective multiome level 1 and level 2 cell annotations were projected on the WNN UMAP reduction using the MapQuery function (Fig. S6A). Agreement of original snCv3 annotations multiome annotations were assessed using sankeyNetwork graphs implemented by the networkD3 package (v0.4) (Fig. S6B, H). To assess the presence of inflammatory PT cells, the AddModuleScore_UCell function from the UCell package(53) was used to score each cell using a signature derived from inflammatory PT cells in the multiome dataset (Fig. S6E) and the predicted level 2 annotations were used to compare expression patterns of inflammatory PT markers in both datasets (Fig. S6F).
Differential abundance analysis
Abundance of different cell types in samples was calculated as proportion of total cells or proportion of total epithelia cells (non-Immune/Endothelial/Interstitial cells) to normalise for shifts in overall composition due to immune cell influx in UUO samples. Comparisons between cell populations in control and UUO samples were performed using a Wilcoxon signed-rank test using the R function wilcox.test with the option alternative = "two.sided" (Fig. 1E, 2B, S9B, F).
RNA velocity and pseudotime analysis
Count matrices of spliced and unspliced reads were generated from cellranger GEX BAM files for each library using velocyto (v0.17.15)(54). The gene model from the 10X hg38 reference genome along with a mask for repetitive regions downloaded from the UCSC genome browser were passed to the velocyto run command along with BAM files and a whitelist of nuclei barcodes retained in the final analysis. RNA velocity streams were inferred independently for PT, other epithelial and myeloid cell subclusters using scVelo (v0.2.5)(55). Velocyto loom files were imported using the scv.read function and UMAP embeddings and cell annotations were imported to the AnnData object. The data was normalised according to the standard workflow and moments across 50 nearest neighbours in the first 50 principal components were computed. Splice kinetics were then modelled using a likelihood based dynamical models and visualised using pl.velocity_embedding_stream (Fig. 2A, 4C, S9A, E).
To infer pseudotime orderings for PT, LOH and myeloid cell subsets we used the slingshot R (v2.8.0) package(56). For PT trajectories we considered healthy PT S1-S3 as start points and PT Inflammatory cells as end point. Similarly, for LOH trajectories cTAL1, cTAL2 and mTAL clusters were considered as starting point and LOH Inflammatory as end point, while ignoring Macula Densa cells. Two trajectories towards cDC2 and Activated Macrophages each starting from CD14 monocytes were specified for myeloid trajectories. The getLineages with relevant start and end clusters was used to construct the minimum spanning tree and getCurves was used to get a smoothed ordering of cells along the pseudotime trajectory. Gene expression and chromatin accessibility dynamics correlated with the inferred pseudotime values were identified using the tradeseq package, using the fitGAM function to fit regression models to counts and pseudotime values. Normalised count values of correlated genes or peaks were then extracted from the Seurat object and smoothed using the smoothing function (with strength = 0.2) from the modelbased R package (v0.8.6.3). Heatmap plots of smoothed dynamics were then visualised using the pheatmap package (v1.0.12) (Fig. 2E, 4D, 6B, S9D).
Gene set analysis
We defined differentially expressed genes between PT Injured/Inflammatory and PT S1-S3 cells using the FindMarkers Seurat function. Up- or downregulated genes with an absolute average log2-fold change > 0.25 and adjusted p-value < 0.05 were forwarded for over-representation analysis implemented by the clusterProfiler R package (v4.10.0)(57). Using the enrichGO function enrichment for GO biological process terms with < 600 genes were tested. To project the most significantly enriched non-redundant GO terms onto the pseudotime trajectory, genes associated with each GO term were extracted and cells were scored using the AddModuleScore_UCell function. A locally estimated scatterplot smoothing curve was fitted to scaled GO scores as implemented by geom_smooth with a span smoothing strength of 0.4 (Fig. 2F).
Gene regulatory network analysis
We used the SCENIC + python (v1.0.0) package(58) for gene regulatory network inference and TF analysis. For this, a custom cistarget database overlapping with ATAC peaks found in the dataset was created by generating a FASTQ file containing peak regions using the bedtools getfasta command. The create_cisTarget_databases.py script was then used with the SCENIC + motif collection (provided by SCENIC+) to score motif sequences in each ATAC peak.
Next, the Seurat dataset was subsampled to < = 1000 cells per level 2 annotation and exported to python (v3.8.0). The remaining SCENIC + workflow was followed with minor alterations. Briefly, GEX and ATAC were pre-processed with default settings, chromatin topic modelling was performed using pycisTopic, and 70 topics were selected for the dataset. PycisTarget was then used to find motif enrichments in accessible chromatin regions, enhancer to gene relationships and TF to gene relationships were calculated and the gene regulatory network was constructed using SCENIC + functions.
The derived regulons consisting of TF-CRE-gene links were further pruned by removing regulons with negative CRE to gene correlations and regulons with < 15 target genes. In total we detected filtered regulons for 474 unique TF motifs found in 109147 unique regions linked to 11567 genes. The filtered regulons were exported to R and cells were independently scored by calculating scores for TF gene targets and CRE targets of each TF using the AddModuleScore_UCell function. A combined TF score for each TF in each cell was calculated as mean of the gene target and CRE target scores. For all further analysis of gene regulatory networks in specific cell types (e.g. PT cells) only genes and CRE expressed or accessible in the given cell type were considered to generate cell-type specific regulons. Genes were considered expressed in the cluster if i) for genes with global expression level below expression by 10% of cells the gene is expressed by at least 2% of cells in a subcluster ii) for genes with a global expression level above 10% it is expressed by an equal or higher proportion of cells than the global expression level iii) it is expressed by more than 20% of cells. Similarly, regions were considered accessible if the same thresholds were met. Other regions and genes were considered not expressed and removed from the analysis. For linkage analysis between TFs and target genes the linkages between TFs, CREs and target genes were derived from SCENIC + regulons (Fig. 5, 6, Table S18). Overlap between TF target genes was assessed by calculating the Jaccard index between lists of target genes of individual TFs (Fig. 5D). TF score dynamics along pseudotime trajectories were visualised by fitting a locally estimated scatterplot smoothing curve with a smoothing factor of 0.4 using the geom_smooth function (Fig. 5F).
Ligand-receptor analysis
Analysis of ligand-receptor interactions was performed using the CellChat (v2.1.0) package(59). We asked how different PT subclusters interact with immune cells and fibroblast, therefore the dataset was subset to these cell types. To infer interaction the CellChat workflow was followed with default setting, except adjusting the truncated mean gene filtering to 0.01 in the computeCommunProb function in order to account for low expression levels observed particularly by chemokine genes. For visualisation purposes ligands and receptors overexpressed in injured or inflammatory PT cells were extracted and plotted using the Seurat DotPlot function for the ligands and receptors identified by CellChat. Interactions were manually connected using the CellChat interaction database as reference, while omitting individual receptors with no expression or non-relevant expression patterns due space constrains (Fig. 4A, 4F).
Re-analysis of mouse sc/snRNA-seq datasets
Count matrices from previously published datasets were downloaded from gene expression omnibus under accession numbers GSE140023(15) and GSE139107(9). In both cases the count matrices were loaded in R and cells were subset to clusters identified as PT cells in the original publication. For both datasets Seurat RPCA integration was used to account for batch effects in experimental design. For the R-UUO datasets the data was split by the timepoint of sample collection (Control, UUO2, UUO7, R-UUO) and each subset was independently normalised using SCTransform with the option vst.flavor = "v2" and principal components were computed using RunPCA with npcs = 100. Integration anchors were computed using FindIntegrationAnchors with options reduction = "rpca" and k.anchor = 30 and subsets were integrated using IntegrateData with dims = 1:50. Subsequently the UMAP dimensionality reduction and clusters were computed using RunUMAP, FindNeighbors and FindClusters with the first 50 principal components. Analogously, the IRI data integrated using the replicate information provided in the GEO records.
In both cases cluster markers were computed using the FindAllMarkers function with options min.pct = 0.01 and logfc.threshold = 0.1. Similarly, we found clusters expressing injury markers such as Havcr1 and Vcam1 were enriched post-injury. We identified subclusters expressing markers of maladaptive repair(9), showing similar expression to inflammatory PT cells in human kidneys such as Ccl2, Cxcl1, Icam1 and Tgfb2. Notably, the expression of Vcam1 and Dcdc2a was more restricted to this cluster and not found in generic injury clusters in both datasets, as opposed to humans where respective VCAM1 and DCDC2 genes were more widely expressed (Fig. S7C). Due to overall similarities, we followed naming conventions using for the human data. Finally, the proportion of each cell state was calculated as the proportion of total PT cells in each animal model at each timepoint and visualised as time series graph (Fig. 2G).
CosMx spatial molecular imaging
Sample processing
5 µm sections were cut from FFPE embedded tissue blocks and adhered to Superfrost Plus slides (VWR, 631 − 0108). Slides were deparaffinized with two 5 min Xylene washes and two 2 min 100% ethanol washes followed by proteinase K (3 µg/ml; Thermo Fisher, AM2546) digestion (40°C for 30 min) and heat induced epitope retrieval at 100°C for 15 min in ER1 buffer (Leica, AR9961). After rinsing with nuclease free water, slides were incubated with 1:1000 diluted fiducials (Bangs Laboratory) in 2X SSCT (2X saline sodium citrate, 0.001% Tween 20) for 5 min at room temperature. After PBS washes, slides were fixed with 10% neutral buffered formalin for 5 min at room temperature. After rinsing with Tris-glycine buffer (0.1 M glycine, 0.1 M Tris-base) and a 5 min wash with PBS, samples were blocked using 100 mM N-succinimidyl acetate (Thermo Fisher, 464750250) in NHS-acetate buffer (0.1 M NaP, 0.1% Tween pH 8) for 15 min at room temperature. Samples were rinsed with 2X saline sodium citrate (SSC) for 5 min and covered with an Adhesive SecureSeal Hybridization Chamber (Grace Bio-Labs).
The 980-plex ISH probe panel(60) (Suppl Table 20) was denatured at 95°C for 2 min and placing on ice. The ISH probe mix was prepared using 1nM ISH probes in 1X Buffer R and 0.1 U/µL SUPERaseIn RNase Inhibitor (Thermo Fisher, AM2696). Samples were then incubated with the probe mix at 37°C overnight. On the next day slides were washed with 50% formamide (VWR) in 2X SSC at 37°C for 25 min and rinsed twice with 2X SSC for 2 min at room temperature. Before antibody incubation slides were treated with blocking buffer containing DAPI nuclear stain for 15 min. For visualization of cell morphology antibodies against C298/B2M, PanCK, CD45 and CD3 from the CosMx Segmentation and Supplemental Markers Kit were chosen and incubated at room temperature for 1 hours. Finally, after washing with 2X SSC for 2 min at room temperature custom-made slide covers were attached to the sample slide to form a flow cell.
Image acquisition
RNA readouts were acquired according to previously described protocols(60). Briefly, the assembled flow cells were loaded onto the SMI instrument and washed with reporter buffer. The flow cell was washed and a median of 12 and total of 140 field of views (FOVs) were defined inside which RNA was detected. The imaging procedure was repeated with 16 reporter pools, with each imaging step repeated 8 times to increase RNA detection sensitivity. Each cycle was initiated by flowing 100 µL of the Reporter Pool into the flow cell and incubating for 15 min followed by flushing with 1ml of Reporter Wash Buffer. The wash buffer was replaced with imaging buffer and nine Z-stack images (0.8 µm step size) of each FOV were acquired. Each cycle was completed with UV cleavage of fluorophores on the reporter probes and washing using Strip Wash buffer.
Following RNA readouts, staining for DAPI and the four morphology antibodies was acquired by capturing nine Z-stack images after washing with Reporter Washing Buffer.
Image processing and cell segmentation
Image processing and feature extraction followed the NanoString pipeline previously described(60) which extracted XYZ locations of individual transcripts. After image registration, feature detection and localisation the Z-stack images the nuclear and cell membrane markers were used define cell boundaries using a machine learning algorithm(60). Cell segmentation boundaries were used to assign transcripts to cells and subcellular compartments from which a count matrix was generated. Pipeline outputs included a cell by gene matrix, cell segmentation polygons, and cell meta data.
Data processing, clustering and dimensionality reduction
Data was initially imported into Seurat using the LoadNanostring() function. The number of transcripts and genes per cell were assessed for regional biases (Fig. S11A,B) and cells with < 20 transcripts were excluded from quantitative analysis but not from image plots. Batch effects between biopsy and nephrectomy samples were observed and therefore integrated using the harmony (v0.1) package(61).
For this, each group was independently normalised with SCTransform using vst.flavor = "v2", and principal components were computed with RunPCA with option npcs = 100. RunHarmony was then used to align principal components and the UMAP projection was computed on adjusted principal components. Cell types were defined in an iterative manner similarly to the multiome dataset by first defining broad clusters resembling distinct cell lineages, followed by sub-clustering individual clusters.
Cell type annotations were guided by marker genes previously identified in the multiome dataset. As the probe panel did not target kidney-specific genes similar clustering resolutions were not possible, however we could resolve broad nephron segment signatures based on distinct marker expression patterns. For example, EGF expression characteristic of LOH and DCT tubule cells was observed, but due to lack of additional markers discriminating these two groups robust sub-clusters could not be derived. Therefore, epithelial cells of the nephron were broadly divided into PT, LOH-DCT as well as collecting duct (CD, with resolved PC and IC) segments. Notably, no CNT and ATL/DTL markers were detected, and these are likely grouped together with other segments. Using the previously defined injury and inflammatory markers we next searched for patterns of epithelial cell activation by sub clustering the broader epithelial clusters. We observed expression of injury markers such VCAM1, TPM1, VIM, SPP1, ITGB6 and ITGB8. Further, markers predominantly expressed by inflammatory cells such as CCL2, CXCL1, ICAM1 and MMP7 were co-expressed by subsets of epithelia, confirming expression patterns observed in the multiome dataset (Fig. 3D). Therefore, these cells were annotated following these conventions. We observed expression patterns clearly identifying other non-epithelial cell types such as endothelia, fibroblasts, smooth muscle cells, mesangial cells, myeloid cells as well as T and B lymphocytes. These were further subclustered if discrete populations were observed, e.g. myofibroblasts are distinguished by expression of COL1A1 and COL3A1 in addition to shared fibroblast markers (Fig. S12). In addition to level 1 and level 2 annotation, for visualisation purposes we derived a further annotation summarizing healthy, injured, and inflammatory epithelia as well as broad non-epithelial cell types (Fig. S11D). Visualisations of cell segmentations and transcript locations were generated using Seurat’s ImageDimPlot function. For epithelial cell clusters proportions were calculated as the number of cells relative to the total of that cell type, while for non-epithelial cell proportions were calculated relative to the total number of cells.
Niche analysis
To define cell niches, a cell x neighbouring cells matrix was constructed by counting neighbouring cell types within a 75 µm radius. To account for differences in cell density, each entry was subsequently divided by the number of observed neighbours. Niches were defined using the Mclust algorithm with an EII model. Initially, 20 clusters were defined and annotated according to the most prominent cell type or anatomical structure in the cluster. Similar clusters were collapsed until 7 niches were defined (Fig. 3E). Enrichment of cell types within niches was assessed by calculating a log2-fold enrichment ratio defined as the quotient between the observed and expected number of cells of cells (proportions of the cell type across the dataset) plus 1. After log-transformation the data was centred on 0 (equivalent to no enrichment) by subtracting 1 (Fig. S13C). Correlation between the proportion of cells in the fibrotic niche and eGFR or %fibrosis by PSR staining was assessed using a linear regression slope with Pearson correlation coefficient and associated p-value (Fig. S13D).
Cell neighbourhood analysis
We inferred enrichment of a given cell type (ctquery) in proximity to other cell types by first defining a search radius of 25 µm measured from the centroid (of each cell’s boundary polygon) to the centroid of a neighbouring cell. Next, we generated a ctquery by ctneighbour count matrix by counting the number of encounters of each level 2 annotations within the search radii of each query cells. Observed counts were then normalised by the absolute number of cells of the respective cell type encountered in the tissue section giving the observed frequency fobs.
Simultaneously, we calculated the expected frequency fexp as the proportion a given cell type across the entire tissue sample multiplied by the total number of cells encountered within the search radius (equivalent to random tissue architecture). The enrichment ratio was then defined as the log2-fold ratio between (fobs / fexp) + 1 and calculated for each ctquery to ctneighbour pair.
To derive a symmetric matrix the log2-fold enrichment ratios of a given ctquery to ctneighbour pair were multiplied with each other. For visualisation purposes enrichment values were log2 transformed and centred on 0 by substracting 1 (with > 0 indicating positive enrichment). We calculated the enrichment ratio for each tissue sample individually and used the mean across all samples for visualisation as heatmap (Fig. 3H). For individual pairs we assessed statistical significance of enrichments across samples using the Wilcoxon signed-rank test with the R function wilcox.test with alternative = "two.sided" (Fig. 3I).
Transcript enrichment
In order to assess transcript enrichment in proximity of a given cell type we defined search radii up to a distance of 50 µm in 1 µm steps from the cell centroid. For a given cell type the number of detected transcripts of a given species in each interval were then counted and normalised by the area of the circle segment and the number cells in the area. Normalised transcript counts were averaged over all tissue sections and plotted as log2 + 1-fold enrichment over a sample of 10,000 random cells (independent of cell type) (Fig. 3J, S14C).
Cell culture
Primary renal proximal tubule epithelial cells (RPTEC/TERT1, ATCC CRL-4031) were cultured at 37°C in a 5% CO2 incubator using custom medium containing 25 ng/ml hydrocortisone (Merck, H0888), 3.5 µg/ml ascorbic acid (Sigma-Aldrich, A4403), 1% insulin-transferrin-selenium-ethanolamine (ITS-X) (Thermo-Fisher Scientific, 51500056), 6 pM Triiodo-L-Thyronine (Merck, T6397), 25 ng/ml Prostaglandin E1 (Merck, P8908), 0.8% v/v of 0.28 g/ml N-2-hydroxyethylpiperazine-N-2-ethane sulfonic acid (HEPES, Sigma-Aldrich, H4034) dissolved in 1M sodium hydroxide, 10 ng/ml recombinant human Epidermal Growth Factor (Promega, G5021) and 0.1 mg/ml geneticin (Gibco, 10131027) in Dulbecco’s Modified Eagle Medium (GIBCO, 31331093). All experiments were performed with cells between passage 3 and 6.
For irradiation experiments, cells were plated in 6 well plates (200,000 cells in 2 ml media) then administered 10 Gy of irradiation 3 days later using a Caesium-137 gamma irradiator (CIS Bio International Gamma Irradiator type IBL 637). Control cells were ‘mock’ irradiated (taken to irradiator but not exposed to irradiation). Media was changed immediately following (+/- mock) irradiation and again 3 days later. SA-β-GAL staining (Cell Signalling Technology) was performed as per the manufacturer’s instructions and imaged on the EVOS FL Auto 2 microscope. Images were analysed in in Image J (version 8). Separate macros were run for total cell counting and SA-β-GAL positive cell counting.
Samples were collected for RNA analysis 7 days after irradiation using 1 ml of TRIzol (Invitrogen, 15596026). The suspension was stored at − 80°C until RNA extraction. RNA was extracted using the RNeasy Plus micro kit (Qiagen) and included the optional DNase step. The RNA was eluted in 14 µl RNase-free water and concentration estimated using a Nanodrop One (Thermo- Fisher Scientific). RNA quality was assessed using the RNA 6000 Pico Kit (Agilent, 5067 − 1513) for RNA integrity numbers (RIN) > 7. Extracted RNA was sent to Genewiz where Poly-A enriched libraries were prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina following manufacturer’s instructions (New England Biolabs, E7760).
For CUT&RUN experiments were plated at 0.8 × 106 cells/well into 6-well plates. The next day cells were treated with recombinant human TNF (20 ng/ml, Bio-Techne, 210-TA-020). After 24 hours cells were detached using 0.05% Trypsin and 0.02% EDTA in PBS for 8 min and resuspended in culture medium before nuclei isolation.
Bulk RNA-seq analysis
RNA-seq libraries from RPTEC irradiation experiments were sequenced on an NovaSeq 6000 platform (Illumina) with a mean output of 159 million 150bp paired-end reads per sample. After sample demultiplexing, adapter sequences were trimmed from FASTQ files using Trim-Galore (Cutadapt v2.8). Reads were aligned the GRCh38 (hg38) reference genome (GENCODE primary assembly) with a mean alignment rate of 84.3% using STAR (v2.7.9a)(62) with the option --quantMode GeneCounts. Gene counts were imported to R and analysed using DESeq2 (v3.18)(63). Genes with less than 200 overall counts were removed from the analysis. Differential expression analysis was performed using the DESeq function with default settings while controlling for variations due to replicate number in the model design. For visualisation purposes the count data were normalised using a variance stabilising transformation implemented by the DESeq2 vst function and scaled before plotting.
Re-analysis of ABT-263 RNA-seq data
Count matrices of previously published RNA-seq data were downloaded from gene expression omnibus under accession numbers GSE157866(42). Transcript abundances (Salmon transcript-level quantification) were imported into R and summarized to a gene level using the tximport function. Gene IDs were subsequently annotated to MGI nomenclature using BioMart functionality and the data was analysed using DESeq2 as described previously. Briefly, a DESeq2 object was created using DESeqDataSetFromMatrix with treatment and replicate as the design formula. Genes with less than 10 gene total gene counts were removed and the data was normalised using the DESeq and vst function with default setting. After Variance stabilizing transformation expression and gene-wise scaling, expression values of previously identified markers were visualised using pheatmap. Sample 4 from the ‘vehicle post R-UUO’ group was excluded from visualisation, due to behaving during outlier in principal component analysis and gene expression not behaving similarly to other samples in the group, leaving 5 remaining samples in the group.
CUT&RUN
Library preparation
CUT&RUN libraries from TNF-treated RPTECs were generated using the CUTANA ChIC/CUT&RUN kit (EpiCypher, 14–1048) with provided reagents unless indicated. After cell harvest samples nuclei were isolated. For this, cells were resuspended in 500 µl cold nuclei extraction buffer containing 2 mM HEPES pH 7.9, 10 mM KCl, 0.1% Triton X-100 (Merck, X100), 20% glycerol, 1X cOmplete protease inhibitor cocktail (Roche, 11873580001), 0.5 mM spermidine in PBS. After incubation on ice for 5 min, nuclei were centrifuged (500 g, 5 min, 4°C) and the nuclei pellet was resuspended in nuclei extraction buffer without Triton X-100. Nuclei were counted using a LUNA-FX7 cell counter (Logos Biosystems) and 0.5×106 nuclei were used as input for the CUT&RUN assay.
The CUT&RUN assay was then performed following manufacturer instructions. Briefly, activated Concanavalin A (ConA)-conjugated paramagnetic beads were bound to nuclei. Using a magnetic rack bead-bound nuclei were washed and resuspended in antibody buffer. For control reactions 0.5 µg IgG negative control antibody (EpiCypher, 13–0042) was used while 0.5 µg of NFKB1 (Cell Signalling Technology, 13586S) or JUN (EpiCypher, 13-2019) were added for positive reactions. After overnight incubation and washes pAG-MNase was added and chromatin was digested by addition of CaCl2 solution. Digestion was stopped by addition of the Stop Master Mix and the supernatant was transferred to a DNA clean-up column. After washes DNA was eluted in 12 µl of DNA Elution Buffer and quantified using the Qubit dsDNA Quantification kit (Thermo Fisher Scientific, Q32851).
For each reaction 5 ng purified DNA were used as input for library preparation using the CUTANA CUT&RUN Library Prep Kit (EpiCypher, 14–100) according to manufacturer instructions. Library was quantified using the Qubit dsDNA Quantification kit and fragment size was assessed using the Bioanalyzer High Sensitivity DNA Analysis kit (Agilent, 5067 − 4626).
CUT&RUN Analysis
Libraries were sequenced on a NextSeq 2000 platform (Illumina) with 100bp paired-end reads and a mean read depth of 10 million reads per library. After library demultiplexing remaining Illumina indices were removed using Trim Galore (Cutadapt v2.8) and aligned to the GRCh38 (hg38) reference genome with bowtie2 --local --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 (v2.5.2)(64). Peak calling was performed using MACS2 (v2.2.7.1) with default settings. The peak significance threshold was determined using p-values derived from MACS2 with manual inspection of peak calling sensitivity at expected locations. A threshold of -log10(p-value) of 5 and 4 were used for JUN and NFKB1 peaks respectively (Fig. S17A). The BAM files were converted to BigWig format using the bamCoverage command from DeepTools (v3.5.4) with the normalisation option --normalizeUsing CPM and visualised using Signac’s CoveragePlot() function. Overlap of ATAC peaks with CUT&RUN peaks was calculated requiring a 10% overlap in both peaks using the bedtools (v2.31.0) intersect function with parameters -wao -r -f 0.1. The overlapping ATAC peaks were used to infer differential accessibility in these regions using the RunPresto function with parameters logfc.threshold = 0, min.pct = 0.00 or find motifs using the Signac function FindMotifs with default settings (Fig. 6F-G). The fraction of ATAC reads in CUT&RUN peaks for each cell was calculated using Signac’s FeatureMatrix() function using the original CUT&RUN peaks in BED format as the feature set to be quantified (Fig. S17E-F).
Data availability
Raw and processed sequencing data is available on Gene Expression Omninbus. Multiome snRNA/snATAC-seq data is available under the accession GSE254185. CosMx pipeline outputs are available under accession GSE253439. Bulk RNA-seq data of irradiated RPTECs and CUT&RUN data can be found under accessions GSE252584 and GSE254187 respectively.
R and python scripts employed in the analysis are available from the corresponding authors on request.
Statistics
Comparisons of clinical and histological characteristics between the unobstructed and obstructed groups were performed by students t-test or Chi-test for continuous or categorical data respectively. In the CosMx analysis, the relationship between abundance of cell types and eGFR or percentage fibrosis area was assessed using a linear regression slope with Pearson correlation. In the AP-1 inhibitor experiment, comparisons between the vehicle and T-5224 groups were assessed by students t-test.