Attention-based deep learning for accurate cell image analysis

doi:10.21203/rs.3.rs-3933549/v1

Download PDF

Article

Attention-based deep learning for accurate cell image analysis

https://doi.org/10.21203/rs.3.rs-3933549/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

High-content analysis (HCA) holds enormous potential for drug discovery and research, but widely used methods can be cumbersome and yield inaccurate results. Noisy and redundant signals in cell images impede accurate deep learning-based image analysis. To address these issues, we introduce X-Profiler, a novel HCA method that combines cellular experiments, image processing, and deep learning modeling. X-Profiler combines the convolutional neural network and Transformer to encode high-content images, effectively filtering out noisy signals and precisely characterizing cell phenotypes. In comparative tests on drug-induced cardiotoxicity, mitochondrial toxicity classification, and compound classification, X-Profiler outperformed both DeepProfiler and CellProfiler, as two highly recognized and representative methods in this field. Our results demonstrate the utility and versatility of X-Profiler, and we anticipate its wide application in HCA for advancing drug development and disease research.

Biological sciences/Computational biology and bioinformatics/High-throughput screening

Biological sciences/Drug discovery/Drug screening/High-throughput screening

The Developing novel pharmaceutical drugs represents a substantial investment that involves significant amount of time and resources, but with low success rate. The major obstacles hindering drug development include ineffective drug activity, intractable drug toxicity, as well as marketing difficulties^1,2. Medical research focuses on identifying the causes and deciphering the mechanisms related to a disease, aiming to eventually develop accurate diagnostic tools and effective treatments. In the last decades, high-throughput methods that are able to generate enormous data sets in a short amount of time and in a cost-effective manner³. The exploitation of so-called ‘big data’ will enable us to undertake research projects never previously possible but should also stimulate a re-evaluation of all our data practices. The high-throughput and affordability of current technologies such as omics and imaging, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task⁴. Data-driven medicinal chemistry approaches have the potential to improve decision making in drug discovery projects, providing that all researchers embrace the role of ‘data scientist’ and uncover the meaningful relationships and patterns in available data⁵. The success of developing deep learning models heavily relies on the availability of vast amounts of data and advanced algorithms⁶. Massive data helps to train the model effectively without premature overfitting, while the reasonableness and complexity of the network structure endow the model with good memory and information extraction capability⁷. In addition, the diversity of enormous data makes model training and prediction easier by eliminating complex preprocessing steps, such as segmentation, tracking, tracing, and spatial conversion. As a result, the task can be performed through an end-to-end prediction, simplifying the entire process⁸.

Cellular morphology is a potentially rich data source for interrogating biological perturbations, especially at a large scale^9–11. For example, Cellular morphological profiling of compounds has been used to determine their mechanism of action^8,12,13, identify their targets^14,15, discover relationships with genes^16,17, and characterize cellular heterogeneity¹⁸. Genes have been analyzed by creating profiles of cell populations in which the gene is perturbed by CRISRP and RNA interference (RNAi); these profiles have been used to represents the functional landscape of essential human genes^19–21 and identify genetic interactions^22,23, or characterize cellular heterogeneity²⁴. Nowadays the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium released a dataset which contain images and profiles from the Cell Painting assay for over 116,750 unique compounds, over-expression of 12,602 genes, and knockout of 7,975 genes using CRISPR-Cas9 in human osteosarcoma cells (U2OS)²⁵. One of the main tools currently used in image processing and interpretable feature extraction is CellProfiler^26,27, developed by the imaging platform of Broad Institute, which designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically, a wide variety of measurements can be generated for each identified cell or subcellular compartment, including morphology²⁸, intensity, and texture²⁹ among others. Another feature extraction tool based on deep learning deepprofiler presented by Cimini and Carpenter et al. used weakly supervised learning (WSL) for modeling associations between images and treatments, and showed that it encodes both confounding factors and phenotypic features in the learned representation³⁰.

Despite the obvious advantages of ‘big data’, the field has to overcome important challenges most of the analyses performed have inherent biologic and experimental errors and rely on capturing a snapshot of complex and dynamic biological systems³¹. Consequently, untimely sample collection due to an incorrect experimental design or simply due to randomness can lead to too much ‘noise’ that would not allow us to clearly identify inciting factors or errors³. During cellular phenotypic experiments, the compound effects on cells are random, and the properties of compounds cannot be accurately distinguished from images of one or a few cells³². Therefore, a ‘noise’ filter tool was needed to adaptively filter out irrelevant cell-slice images and accurately extract features for cell phenotype characterization. Transformers use a self-attention mechanism to weigh the importance of each token in the context of the entire input sequence. This allows the model to capture long-range dependencies and relationships between tokens³³. To address the aforementioned issues, we combined the convolutional neural network with Transformer to developed X-Profiler, a deep learning-based method for processing ‘noise’ filter and downstream task related feature extraction, to extract valuable information from high-content images for drug analysis.

In this study, We conducted a comprehensive analysis of cell phenotype images, covering aspects from data processing to model training and validation. To evaluate the potential applicability of our method in high-content image processing, we have chosen three widely recognized downstream tasks in drug discovery: Task 1 - determining drug inhibition on hERG (the human Ether-à-go-go-Related Gene) ion channels, Task 2 - predicting drug-induced mitochondrial toxicity, and Task 3 - classifying compounds. In all tasks, X-Profiler has consistently shown performance surpassing the state-of-the-art (SOTA) levels, demonstrating more accurate and effective feature extraction capabilities.

Model architecture of X-Profiler

In this study, we address the issue of high impurity in the dataset by introducing a novel deep learning framework. Instead of expecting each individual slice of the dataset to conform directly to a desired label mapping, we explicitly assign them distinct rankings and weights. Formally, we utilized cell painting data to train and verify our model, segmenting large cellular images into individual slices based on the positional information of cells and constructing intercellular associations by arbitrarily selecting portions of single slices. We introduced the concept of "patch"—bundles of different single cell slice images subjected to the same perturbation. By employing a ResNet to extract image features and utilizing transformer to capture the relational patterns within the same patch, we can identify and exclude poor-quality images, thereby enhancing the model's accuracy. The training and validation processes of the model were shown in Fig. 1a. In the course of model training and validation, 6 to 12 of single-cell images slices were bundled into a patch and fed into the model in batches. During the training phase, the deep learning model iteratively learns from the training data. In each epoch, which is a full pass through the dataset, the model performs a forward pass to make predictions and then calculates the loss by comparing these predictions to the actual labels. The backward pass follows, where the model uses backpropagation to compute the gradient of the loss with respect to each weight. The optimizer then updates the model's weights in an effort to minimize the loss. After each epoch, the model's performance is evaluated on a separate validation set, which helps monitor its ability to generalize to unseen data. During the prediction phase (Fig. 1b), we performed random sampling on the images and packaged single-cell images into patches, comprising 10 to 750 units. Subsequent to this, the model engages in forward propagation and employs the output of the linear layer for the score prediction. Synchronously with the acquisition of the model's predictions, we are also presented with a ranking for each individual cell image within the respective patch, culminating the comprehensive data processing endeavor.

X-Profiler’s performance on hERG inhibition, Mitochondrial toxicity and compounds classification

We designed three tasks to evaluate the capability of X-Profiler. For the hERG inhibition task, we selected 100 compounds (47 toxic and 53 non-toxic) for HCS experiments and generated corresponding high-content images using a high-content imager. We chose 38 compounds (19 toxic and 19 non-toxic) as test dataset for the determination of drug inhibition on hERG ion channels. The high content images were employed to assess the hERG inhibition classification capabilities of the three models: X-Profiler, DeepProfiler, and CellProfiler. Our results showed that X-Profiler outperformed feature extractor DeepProfiler and CellProfiler, achieving an accuracy of 90.6% in cell images classification and the precision, recall and f1 scores for X-Profiler were 80.0%, 95.5% and 86.2%, respectively (Figure. 2a). Moreover, we showed the landscape for compound features using UMAP and two separated cluster for hERG channel inhibitors and non-hERG channel inhibitors were presented. (Figure. 2b). The silhouette score (ss) and adjusted rand score (ars) of clusters between toxicity and non-toxicity compounds was 0.446 and 0.272, respectively, which indicated X-Profiler have the ability to capture the cell phenotype changes affect by hERG channel inhibitors.

For the mitochondrial toxicity task, we mapped all 100 compounds with Tox21 dataset annotation³⁴. Out of these, 77 compounds bore additional mitochondrial toxicity labels (Supplementary Table 2). We selected a test set of 15 compounds, of which 1 were labeled 'Active', 3 were labeled 'Inactive', and 11 were labeled 'Inconclusive'. Mitochondrial toxicity caused by certain compounds is a widespread form of organ toxicity, which can result in multiorgan damage in the heart, liver, bone, and brain^35–37. The alterations in cell morphology, texture, and intensity caused by compounds are strongly correlated with mitochondrial toxicity³⁸. We evaluated the classification performance of the three models. The X-Profiler model outperforms the other two models in terms of accuracy, recall, precision and f1 score, and the values for X-Profiler are as follows: 54.5%, 52.4%, 47.3% and 43.3%, as showed in Figure. 2c. The landscape for mitochondrial toxicity was showed in Fig. 2d and the silhouette score and adjusted rand score of clusters between activate, inactivate and inconclusive compounds was 0.023 and 0.083, respectively. For the suboptimal performance of Mitochondrial toxicity, there may be two reasons for this. 1) the among of labeled drugs are decreased and the classification increased, less data implies lower model robustness; 2) the effects of mitochondrial dysfunction are often confusingly cell type-specific, as is the case for the majority of known mitochondrial diseases³⁹, the human HepG2 cells was applied for labeling, while iPSC-CMs might exhibit different toxicity reactions to the same compounds.

For the compounds classification task, it involved classifying images based on the cell response to different compounds. For the same compound, 20% of the images were used for testing, while 80% of the images were used for training. Our results showed that the X-Profiler model outperformed DeepProfiler and CellProfiler, achieving an accuracy, precision, recall and f1 score of 0.867, 0.892, 0.871 and 0.881, respectively. Apparently, X-Profiler demonstrates superior ability in discerning relationships among images (Fig. 2e, Fig. 2f).

Comparison of distance and angle for high-dimensional features

In our investigation, we quantified the distance and cosine value between the image high-dimensional features of various compounds at differing concentrations (X-Profiler: 512-d; DeepProfiler: 672-d; CellProfiler: 613-d). We intended to enhance our understanding of the model's capacity to discriminate between cellular features post-pharmacological intervention. We determined the cosine similarity of compounds with DMSO, as illustrated in Fig. 3a. The cosine values spanned from − 1 to 1, indicative of an angular spectrum extending from − 180 degrees to 180 degrees. It was observed that the cosine values of toxic versus non-toxic compounds exhibited distinct angles within a high-dimensional space. Although the Mahalanobis distance of the compounds in relation to DMSO did not reveal a marked variance between toxic and non-toxic compounds, we noted that the Mahalanobis distance for varying concentrations within a single compound performed that the distance value augmented concomitant with concentration and these values were utilized to deduce the IC50 values for each compound (Fig. 3b). A comparative analysis of X-Profiler with CellProfiler and DeepProfiler is presented in Fig. 3c. The distribution of the absolute discrepancies between true and predicted IC50 values underscores X-Profiler's superior precision, with a mean and standard deviation for all compounds being 2.66 and 2.94, respectively. In contrast, DeepProfiler and CellProfiler yielded mean values of 4.45 ± 4.84 and 3.48 ± 3.56, respectively. The comparative analysis of IC50 prediction curves for the compound carvedilol by X-Profiler, DeepProfiler, and CellProfiler serves as a compelling exemplar, highlighting X-Profiler's preeminent predictive accuracy in determining IC50 values (Fig. 3d). Utilizing the calculated feature distances, we inferred the IC50 values of the compounds, thereby demonstrating that X-Profiler has great potential to replace traditional IC50 detection methods.

Multiple cell slices enable accurate grasp of the compound property

During the feature extraction phase employed by X-Profiler, we obtained a class embedding for a patch of images, referred to as ‘multiple cells features’ (cls embedding), along with the embedding for each individual image in the patch, termed ‘single cells features’. Our research conducted separate analyses on multiple cells and single cells image features in compound classification tasks. Additionally, we compared the landscape of X-Profiler feature with DeepProfiler and CellProfiler. Herein, the multiple cells features aggregate the mean of 100 single cell slice attributes, whereas the single cells features are a stochastic selection from the entirety of single cell slices (Fig. 4a to Fig. 4f). We selected the first ten compounds (alphabetically sorted) from a set of 100 compounds for the t-SNE plot illustration and presented compound distance in two-dimensional space.

Within our results, we observed that CellProfiler lacks the capability to differentiate changes in cellular phenotype between compounds, regardless of whether they form single cells features clusters or multiple cells features clusters (Fig. 4e, Fig. 4f). Additionally, the single cells feature clusters generated by DeepProfiler also failed to distinguish between compounds. (Fig. 4d). The distance between compounds features from X-Profiler and DeepProfiler is more explicit in the landscape of multiple cell features than in that of single ones. For instance, For instance, the presence of benzethonium chloride and betrixaban compounds is evident within the single cells feature landscape as a cluster (Fig. 4a), while in landscape of multiple cells, they evolve to form discrete clusters (Fig. 4b). This suggests that integrating a greater number of image slices can yield more comprehensive features to accurately predict compound properties. Furthermore, in comparison to X-Profiler, the multiple cells features of DeepProfiler merely attain the level of single-cell clustering as exhibited by X-Profiler (Fig. 4c). We believe that the deployment of convolutional networks within X-Profiler and DeepProfiler might engender more efficacious features for the analysis of cell painting data than traditional statistical algorithms such as CellProfiler, notwithstanding the limited interpretability inherent to these features. Nevertheless, X-Profiler exhibits superior predictive capacity in contrast to DeepProfiler. The incorporation of an attention mechanism within X-Profiler augments its cellular feature aggregation, outperforming DeepProfiler which only calculates the mean of individual cell features. This underscores the importance of leveraging transformer architecture to enhance feature extraction capabilities in intricate and unstructured data.

Our study conducted a comparative examination of single cell-slice image from four compounds (specifically bedaquiline, belzutifan, carvedilol, and daclatasvir) as selected from Fig. 4a. The landscape representation of t-SNE revealed that the cellular embeddings of the drugs belzutifan, carvedilol, and daclatasvir formed a tight cluster, markedly distinct from bedaquiline. Concurrently, the cellular images of the compounds belzutifan, carvedilol, and daclatasvir appeared indistinguishable upon visual inspection (Fig. 4g), and displayed comparable intensity in the computation of Reactive Oxygen Species (ROS) and Mitochondrial Membrane Potential (MMP) under varying concentrations, a characteristic not shared with the compound bedaquiline (Fig. 4h).

In an endeavor to comprehend the results more profoundly, we researched the mechanism of action (MOA) of the three compounds. Despite diverging therapeutic indications - belzutifan for Von Hippel-Lindau (VHL) syndrome-associated clear-cell renal cell carcinoma (ccRCC), carvedilol for hypertension, and daclatasvir for Chronic hepatitis C genotype 3 (GT-3 HCV) - their MOAs suggest that they may exert similar effects on cardiomyocytes. Belzutifan inhibits hypoxia-inducible factor 2α (HIF-2α), and research suggests that HIF-2α inhibitors can reverse pulmonary hypertension and that there are shared pathophysiologic mechanisms between cancer and heart failure^40,41. Carvedilol, a nonselective beta-adrenergic antagonist, has anti-free radical and antioxidant effects, and can resist oxidation and reduce ROS production⁴². It further impedes the cardiac Mitochondrial Permeability Transition (MPT), which can depolarize mitochondrial membranes and uncouple oxidative phosphorylation (OXPHOS)⁴². Daclatasvir, a pangenotypic NS5A replication complex inhibitor with a dual antiviral effect, inhibits RNA replication and viral assembly. Cellular ROS levels rise during HCV infection, and evidence suggests that anti-RNA viral drugs are associated with intracellular ROS levels^43,44. Given that cell phenotype reflects the proper classification of drug mechanisms, the application of deep learning holds substantial promise for future endeavors in drug discovery and therapeutic indication investigations.

X-Profiler can act as an effective cell-slice filter

To showcase the capability of X-Profiler in distilling the most salient single-cell images, we generated an attention heatmap using X-Profiler, which highlighted the cells pertinent to the task throughout the entirety of the image concerning the cardiotoxicity task (Supplementary Fig. 1a and Supplementary Fig. 1b). As indicated by the self-attention mechanism, the similarity between the embeddings of single cell-slice image and the embeddings of the CLS token is positively associated with the classification task⁴⁵. In the heatmap, cells that are more relevant to the downstream task are represented by brighter and warmer pink dots, and these dots clearly illustrate the differences between different cells in terms of task correlation. In assessing the Transformer's capacity to segregate specific cell slices, we analyzed the attributes of task-relevant single-cell slice images alongside those deemed irrelevant. We presented the landscape of relevant and irrelevant single cells image features in Fig. 5a, in which the dots symbolize task-related images, while inverted triangles represent those unrelated to the task. The single cells image features unrelated to the downstream tasks were indistinguishable among multiple compounds, while the single cells image features related to tasks were separately clustered and widely spaced apart in accordance with their respective compounds. This demonstrates that the multiple cell slices encoder can operate as a cell filter. Two ancillary observations emerged from this clustering exercise: 1) In cellular experiments involving various compounds with distinct properties, there are often both dead cells and unaffected normal cells that form a common group and do not contribute to classification. After passing through the single-cell slice encoder, the features of these cells are similar; 2) Upon exposure to different compounds, the cell phenotypes undergo more significant changes, making them markedly distinct from each other.

To further demonstrate the screening capabilities of X-Profiler for cell phenotype, we compared cell-slice images that were considered relevant and irrelevant based on the X-Profiler attention map (Fig. 5b). Each cell-slice image was comprised of three channels: nucleus (DNA), mitochondrial membrane potential (MMP) and mitochondrial reactive oxygen species (ROS). Task relevant images exhibited more accurate staining, higher fluorescence intensity, greater contrast between channels compared to the task irrelevant images. Moreover, relevant images exhibited more distinct cell structures in the ROS and MMP channels, leading to more accurate cell identification. In terms of image quality, task relevant images were lucid and free of impurities, in stark contrast to the irrelevant images, which exhibited a halo on the image surface caused by optical structural artifacts, resulting in unclear images and high noise levels.

Cell phenotype images are distinct from general images in two key ways. Primarily, they contain a high degree of redundancy, with dark backgrounds and bright cells dominating most of the image. As a result, it can be challenging to distinguish differences between cells using generic training. Secondly, cell phenotype images are often besieged with noise, resulting from experimental manipulations and batch-to-batch variability, necessitating varied concentration gradients and experimental duplicates to mitigate stochastic influences. Prevalent models and processing techniques have struggled to address these challenges, often yielding poor results. Consequently, the attention mechanism employed in X-Profiler enables the extraction of highly informative, accurate, and structurally intact single-cell slice images, while filtering out low-informative, erroneous, and poor-quality images. This selective extraction has precipitated an increase in the model's prowess following image preprocessing.

Thanks to the advancement of current high-throughput and high-content techniques, enormous biological data are generated every moment to help us gain better understanding of lifes. However, significant noise accompanies the true signal in these data that are generated. For example, cell painting technique allows us to stain various organelles simultaneously, but the capability to use these images to establish biologically relevant models is largely affected by the algorithm we use to distinguish the information from the noise. Current methods, in general, treat every slice of the data, such as each single cell image, equally, and have their limitations in fully utilizing the high content data. Here we propose a deep learning model that has the transformer architecture at its core to effectively assign different weights on different pieces of data. By doing this, we are able to build models that pay more attention to high quality data and less attention to low quality data, which equivalently suppress the affection of noise. The results in this stuty showed that by putting different attentions on different pieces of data, our models achieve better performances in the earlier parts of this study. It is also noted the same approach can be applied to more broader areas to improve the performance of current approaches.

Model parameter selection

Numerous parameters can influence the performance of a model, such as epochs, training batches, learning rate, optimizers, and transformer attention layers, to name a few. In this segment of our research, we evaluated a selection of critical parameters, excluding subjective human expertise. Initially, we explored the variable of patch numbers, a key element of our study, by conducting trials during the training and validation phases with patch numbers ranging from 8 to 12, and during the prediction phase with a broader range from 10 to 750. Our findings indicated that the optimal patch number for training and validation was 8, while during the prediction phase, 100 proved to be most effective. This was evidenced by the superior performance metrics, including accuracy, area under the curve (AUC), and recall, which peaked when the training and validation patch number were set to 8 and demonstrated a plateau in improvement as the prediction patch number reached 100 (Fig. 6a). Regarding the consumption of computing resources, the quantity of patches used during the training and validation phases exhibited no tangible impact on the duration of processing or GPU utilization. Nevertheless, a discernible improvement in resource usage was correlated with an increase in the number of patches designated for the prediction phase (Fig. 6b, Fig. 6c). Additionally, our study encompassed trials concerning the batch size and the attention layers. The outcomes revealed that the batch size bore no significant consequence on the model's performance and the optimal configuration for the attention layers was determined to be three (Supplementary Table 3).

Compared to DeepProfiler and CellProfiler, X-Profiler is a novel HCA process that leverages both our understanding of cellular images and advanced deep learning algorithms. It excels in screening and utilizing informative and high-quality cell-slice images to extract task-relevant cellular features, resulting in superior performance in three drug property-related tasks: 1) determination of drug inhibition on hERG ion channels, 2) prediction of drug-induced mitochondrial toxicity, and 3) classification of compounds. In addition, X-Profiler can concatenate different compounds based on similar cell phenotypes, highlighting its potential in the field of drug repurposing and indication expansion.

In conclusion, HCS is an emerging field that is still rapidly evolving in terms of experimental implementation and analytical methods and has the potential to address diverse biological problems. The key to achieving credible results in downstream tasks lies in the ability of analytical tools or computational models to select useful parts from a large number of cell images of varying quality. Among the tested solutions for downstream tasks, CellProfiler and DeepProfiler performed mediocrely, while X-Profiler showed promising results. This is because X-Profiler, which leverages deep learning to use multiple cell slices for model training and verification simultaneously, can accurately capture key information and eliminate noise interference. Additionally, X-Profiler can distinguish between cells with similar phenotypes but perturbed by different compounds, suggesting many possibilities for downstream applications. Overall, the conclusion emphasizes the potential of X-Profiler as a valuable tool for high-content cell image analysis, which can contribute to various fields such as drug discovery and development, disease diagnosis, and personalized medicine.

In the next phase of our research, we plan to expand the application of X-Profiler to more cell phenotyping tasks and high-content images induced by other perturbations, such as RNA interference (RNAi) or CRISPR Cas9. Our goal is to demonstrate the versatility and effectiveness of our model in analyzing various types of cellular image data. However, we also acknowledge that algorithms based on cell phenotype images have limitations, particularly when the changes in cell phenotype induced by a compound are minimal, which may make it difficult for the model to accurately assess its toxicity and other characteristics. To address this issue, we intend to explore the integration of other high-level information sources, such as transcriptome data or videos of cells captured under brightfield conditions, using multimodal learning techniques. We believe that incorporating multiple sources of information will enhance the accuracy and robustness of our model, enabling it to tackle a wider array of biological problems.

Constructing the cardiotoxicity high-content image data

To obtain an appropriate image dataset for the hERG inhibition task, we selected 100 compounds for HCS experiments and generated corresponding high-content images using a high-content imager. Supplementary Table 2 provides detailed information for each compound, including plate name, drug name, hERG IC50 value range, and cardiotoxicity label. We first downloaded all compounds with hERG IC50 values using the python API interface of the chEMBL Database and subsequently selected 100 compounds by deduplicating and filtering drug molecules with ambiguous hERG IC50 values. The compounds were then labeled as cardiotoxic and non-cardiotoxic based on their hERG IC50 values using a threshold of 10 µM (less than or equal to 10 µM for toxic labeling and greater than 10 µM for nontoxic labeling). This resulted in 47 cardiotoxic and 53 non-cardiotoxic compounds, on which we performed cellular experiments using hiPSC-CMs.

The hiPSC-CMs at day 30 after cardiac induction were cryopreserved as Cauliscell hiPSC-CMs (Cauliscell Company, Nanjing, China) and thawed in a 37°C water bath with gentle shaking. After centrifugation and counting, the cells were added to 384-well plates precoated with 10 µg/ml recombinant human vitronectin at 12,000 cells/well (Cauliscell) in cardiomyocyte plating medium (Cauliscell). After 24 hours, the volume was replaced with cardiomyocyte maintenance medium, which was changed every other day. Once the cells started to beat rhythmically, we added the compounds, which were selected from the FDA-approved compound library (FDA-approved drug library, MedChemExpress) for cardiotoxicity testing. The working concentrations of each compound were 10 µM, 3.33 µM, 1.11 µM, 0.37 µM, 0.12 µM, and 0.04 µM, with 3 replicate instances set for each compound at each concentration. The control group was set with 0.1% DMSO (Sigma), and the blank control was set with water. After incubating the cells with compounds for 72 hours, working concentration dyes of CM-H2DCFDA (Thermo) at 5 µM, TMRM (Thermo) at 20 nM, and Hoechst33342 (Thermo) at 5 µg/ml were prepared with maintenance medium (Cauliscell) and added to the cells for 30 min. The cells were washed with HBSS (Beyotime Biotechnology), and YOYO-3 (Thermo) was added at a final concentration of 2 nM, followed by incubation at room temperature for 10 min and washing with HBSS (Beyotime Biotechnology). Finally, the 384-well plate (PerkinElmer) was placed on a high-content imager (Operetta CLS, PerkinElmer), and images were taken with a 20x water objective in the confocal model with 9 fields of view for one well, corresponding to the filter parameters shown in Supplementary Table 1.

Constructing the mitochondrial toxicity high-content image data

For the mitochondrial toxicity task, we utilized open-source high-content images from Bray et al. [20]. A total of fifty-five plate high-content images with numerous compounds were downloaded, and the mitochondrial toxicity labels, obtained from the PubChem Database (https://pubchem.ncbi.nlm.nih.gov/assay/pcget.cgi?query=download&record_type=datatable&actvty=all&response_type=save&aid=720637), were assigned to each compound, and the assay is qHTS assay for small molecule disruptors of the mitochondrial membrane potential from tox21 program.

CellProfiler and DeepProfiler data processing

For the cardiotoxicity and mitochondrial toxicity high-content images, we extracted cell phenotype features using standard procedures of CellProfiler (version 4.2.4). Subsequently, we utilized the machine learning model called LGBMClassifier from scikit-learn to perform classification tasks on toxicity data, after dealing with cell phenotype features. Once the locations of single cells were identified in the images by CellProfiler, we inputted the images and locations to DeepProfiler (version 0.3.1). During classification tasks, we employed a pretrained model named EfficientNet, which was deployed within DeepProfiler, and calculated embeddings using DeepProfiler. The CellProfiler-LightGBM and DeepProfiler results were evaluated and compared with X-Profiler to assess their accuracy and generalizability. In all three models (X-Profiler, DeepProfiler and CellProfiler-LightGBM), we partitioned the 285 compounds labeled with mitochondrial toxicity into a training set comprising 190 compounds and a test set containing 95 compounds. For the 100 cardiotoxic compounds, the training and testing sets contained 68 and 32 compounds, respectively.

ROS and MMP Measurement

To measure ROS and M MP, we utilized cell phenotype features calculated from our high-content images of 100 compounds, which were used for cardiotoxicity assessment. We designed two CellProfiler analysis protocols that can automatically detect and quantify fluorescence intensity, which proved to be useful for analyzing large image datasets. The ROS and MMP measurements were obtained from the suppressed fluorescent channels of the H2DCFDA and TMRM cell-based assay kit. We computed a reduced dataset with the well-mean feature vector per well, followed by normalizing all features by subtracting the mean of each plate layout from each feature.

X-Profiler model Design

X-Profiler is a plug-and-play flexible model framework that is not restricted to a particular backbone. The model processes cell-slice images by normalizing and grouping them by compound and concentration. First, single cell locations are extracted from CellProfiler by calculating the center coordinate of the nucleus. Second, images are rescaled with the global fluorescence intensity, and third, DeepProfiler crops cell slices from 3 channels based on the x and y coordinates for the center of a single nucleus. In all experiments, cell slices were cropped from a region of 96 x 96 pixels centered on the nucleus without resizing. The resulting 96 x 288-pixel images were preserved for model construction.

Cell-slice images stained with different dyes were integrated as different channels into a single cell-slice multichannel image. This approach reduces noise interference and allows the model to perform the classification task using a few representative cells. Furthermore, the training phase inputs a limited number of cell images at a time to prevent the model from overfitting prematurely, which brought by complex information of too many cell images.

The X-Profiler model consists of three main parts: a single cell-slice encoder, a multiple cell slices encoder, and a classifier. The single cell-slice encoder uses a uniform CNN network (ResNet18) to obtain an unbiased latent representation of single cell-slice image and improve the generalization of the latent representation. The multiple cell slices encoder is built by Transformer, which has strong contextualization capabilities to integrate information from each element in the sequence. A CLS token is added to the top of the sequence to filter out irrelevant cell representations and improve model robustness. The self-attentive mechanism of Transformer is used to fuse the latent representations corresponding to the CLS token, which are then fed into the classifier constructed by a linear layer for classification. The overall flow of prediction is illustrated in Fig. 1.

X-Profiler Training and Inference

During the training process, we used the standard supervised training method with cross-entropy as the loss function, along with learning rate warm-up and cosine decay techniques. In order to account for various experimental and computational factors, as well as potential inaccuracies in the labels themselves, we also employed label smoothing to reduce the model's confidence.

In the testing phase, we made a slight modification to the training approach. While in training we randomly selected 8 cell-slice images to form multiple cell-slices sequences, in testing we increased the number of cell slices included in each sequence, the number of each sequence were 12, 50, 100, 150, 200, 250, 300, 350, 400 450, 500, 550, 600, 650, 700, 750, respectively. In order to take into account the performance of the model and the consumption of the calculation at the same time, we randomly sampled 100 sets of multiple cell slices and computed the average classification result from those sequences as the final outcome. This approach offers the advantage of improving the chances of selecting valuable cell-slice images while also reducing computational costs by not evaluating all individual cells. The optimized X-Profiler models were trained up to 50 epochs. To prevent an overfitting problem, dropout and adamw regularization methods were used. Model training was implemented using pytorch (1.12.0 + cu116). A workstation with an Intel(R) Xeon(R) Gold 5320 (104 core processors, 2.2 GHz) CPU and NVIDIA A100 graphics-processing unit was used for model training.

X-Profiler attention map

As we utilized the CLS token embeddings generated by Transformer for classification purposes, we obtained the attention map by calculating the dot product between the query representation and key representation among the tokens in Transformer. By applying the Softmax function, each element of the attention map was assigned a value ranging from 0 to 1.

Cluster and statistical analysis

To evaluate the performance of feature extraction methods for cell-slice image analysis, we employed the t-SNE algorithm in the python sklearn package to reduce the dimensionality of features or embeddings computed by X-Profiler, CellProfiler, and DeepProfiler. We then examined the resulting component distributions to investigate the relationship between X-Profiler and the cell phenotype observed in images, as well as the effectiveness of X-Profiler compared to CellProfiler or DeepProfiler in capturing useful information. To assess the clustering performance, we used the silhouette score and adjusted random score method in sklearn to calculate the silhouette coefficient. We applied the Kruskal‒Wallis test to test the null hypothesis that the MMP or ROS intensity of images treated with compound at six different concentrations (0.04 µM, 0.12 µM, 0.37 µM, 0.11 µM, 3.33 µM, and 10 µM) were equal, setting significance at p < 0.05.

Competing interests

The authors declare that they have no conflict of interest.

Author contributions

Xiangrui Guo, Xiaodong Wang and Lipeng Lai conceived this research. Fan Zhang, Mengcheng Yao and Xueyu Guo curated the dataset. Xiangrui Gao, Fan Zhang and Xueyu Guo performed data analysis. Xiangrui.Gao devised deep learning algorithms. Xiaoxiao Wang ang Dong Chen conducted the HCS experiments. Xiangrui Gao, Xueyu Guo, Fan Zhang, Mengcheng Yao and Xiaoxiao Wang wrote and modified the paper. Xiaodong Wang and Lipeng Lai supervised this work.

Ashburn, T.T., Thor, K.B.: Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discovery. 3, 673–683 (2004)
Ni, J., et al.: Obstacles and opportunities in Chinese pharmaceutical innovation. Globalization health. 13, 1–9 (2017)
Perakakis, N., Yazdani, A., Karniadakis, G.E., Mantzoros, C.: Vol. 87 A1–A9 (Elsevier, (2018)
Paananen, J., Fortino, V.: An omics perspective on drug target discovery platforms. Brief. Bioinform. 21, 1937–1953 (2019). 10.1093/bib/bbz122
Lusher, S.J., McGuire, R., van Schaik, R.C., Nicholson, C.D., de Vlieg, J.: Data-driven medicinal chemistry in the era of big data. Drug discovery today. 19, 859–868 (2014)
Mehonic, A., et al.: Memristors—From in-memory computing, deep learning acceleration, and spiking neural networks to the future of neuromorphic and bio‐inspired computing. Adv. Intell. Syst. 2, 2000085 (2020)
Chen, H.: Machine learning for information retrieval: Neural networks, symbolic learning, and genetic algorithms. J. Am. Soc. Inform. Sci. 46, 194–216 (1995)
Tian, G., Harrison, P.J., Sreenivasan, A.P., Carreras-Puigvert, J., Spjuth, O.: Combining molecular and Cell Painting image data for mechanism of action prediction. Artif. Intell. Life Sci. 3, 100060 (2023)
Bickle, M.: The beautiful cell: high-content screening in drug discovery. Anal. Bioanal. Chem. 398, 219–226 (2010)
Perlman, Z.E., et al.: Multidimensional drug profiling by automated microscopy. Science. 306, 1194–1198 (2004)
Snijder, B., Pelkmans, L.: Origins of regulated cell-to-cell variability. Nat. Rev. Mol. Cell Biol. 12, 119–125 (2011)
Ljosa, V., et al.: Comparison of methods for image-based profiling of cellular morphological responses to small-molecule treatment. J. BioMol. Screen. 18, 1321–1329 (2013)
Young, D.W., et al.: Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol. 4, 59–68 (2008)
Sundaramurthy, V., et al.: Integration of chemical and RNAi multiparametric profiles identifies triggers of intracellular mycobacterial killing. Cell. host microbe. 13, 129–142 (2013)
Akbarzadeh, M., et al.: Morphological profiling by means of the Cell Painting assay enables identification of tubulin-targeting compounds. Cell Chemical Biology 29, 1053–1064. e1053 (2022)
Castoreno, A.B., et al.: Small molecules discovered in a pathway screen target the Rho pathway in cytokinesis. Nat. Chem. Biol. 6, 457–463 (2010)
Lejal, V., Cerisier, N., Rouquié, D., Taboureau, O.: Assessment of Drug-Induced Liver Injury through Cell Morphology and Gene Expression Analysis. Chem. Res. Toxicol. 36, 1456–1470 (2023)
Loo, L.-H., et al.: An approach for extensibly profiling the molecular states of cellular subpopulations. Nat. Methods. 6, 759–765 (2009)
Feldman, D., et al.: Pooled genetic perturbation screens with image-based phenotypes. Nat. Protoc. 17, 476–512 (2022)
Feldman, D., et al.: Optical pooled screens in human cells. Cell. 179, 787–799 (2019)
Funk, L., et al.: The phenotypic landscape of essential human genes. Cell. 185, 4634–4653 (2022)
Liberali, P., Snijder, B., Pelkmans, L.: A hierarchical map of regulatory genetic interactions in membrane trafficking. Cell. 157, 1473–1487 (2014)
Fischer, B., et al.: A map of directional genetic interactions in a metazoan cell. Elife. 4, e05464 (2015)
Yin, Z., et al.: A screen for morphological complexity identifies regulators of switch-like transitions between discrete cell shapes. Nat. Cell Biol. 15, 860–871 (2013)
Chandrasekaran, S.N., et al.: JUMP Cell Painting dataset: morphological impact of 136,000 chemical and genetic perturbations. bioRxiv, 2003. 2023.534023 (2023). (2023)
Stirling, D.R., et al.: CellProfiler 4: improvements in speed, utility and usability. BMC Bioinform. 22, 1–11 (2021)
Dao, D., et al.: CellProfiler Analyst: interactive data exploration, analysis and classification of large biological image sets. Bioinformatics. 32, 3210–3212 (2016)
Rocha, L., Velho, L., Carvalho, P.C.P.: in Proceedings. XV Brazilian Symposium on Computer Graphics and Image Processing. 99–105 (IEEE)
Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural features for image classification. IEEE Trans. Syst. man. cybernetics, 610–621 (1973)
Moshkov, N., et al.: Learning representations for image-based profiling of perturbations. Biorxiv, 2022.2008. 503783 (2022). (2012)
Alber, M., et al.: Integrating machine learning and multiscale modeling—perspectives, challenges, and opportunities in the biological, biomedical, and behavioral sciences. NPJ Digit. Med. 2, 115 (2019)
Loo, L.-H., Wu, L.F., Altschuler, S.J.: Image-based multivariate profiling of drug responses from single cells. Nat. Methods. 4, 445–453 (2007)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017)
Richard, A.M., et al.: The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 34, 189–216 (2020)
Begriche, K., Massart, J., Robin, M.-A., Borgne-Sanchez, A., Fromenty, B.: Drug-induced toxicity on mitochondria and lipid metabolism: mechanistic diversity and deleterious consequences for the liver. J. Hepatol. 54, 773–794 (2011)
Hargreaves, I.P., Al Shahrani, M., Wainwright, L., Heales, S.J.: Drug-induced mitochondrial toxicity. Drug Saf. 39, 661–674 (2016)
Tang, X., Wang, Z., Hu, S., Zhou, B.: Assessing drug-induced mitochondrial toxicity in cardiomyocytes: implications for preclinical cardiac safety evaluation. Pharmaceutics. 14, 1313 (2022)
Haghighi, M., Caicedo, J.C., Cimini, B.A., Carpenter, A.E., Singh, S.: High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations. Nat. Methods. 19, 1550–1557 (2022)
Meyer, J.N., Hartman, J.H., Mello, D.F.: Mitochondrial Toxicity. Toxicol. Sci. 162, 15–23 (2018). 10.1093/toxsci/kfy008
de Wit, S., Glen, C., de Boer, R.A.: Lang, N. N. Mechanisms shared between cancer, heart failure, and targeted anti-cancer therapies. Cardiovascular. Res. 118, 3451–3466 (2022)
Niemann, B., et al.: Caloric restriction reduces sympathetic activity similar to beta-blockers but conveys additional mitochondrio-protective effects in aged myocardium. Sci. Rep. 11, 1931 (2021)
Pereira, C.: Drug-induced cardiac mitochondrial toxicity and protection: from doxorubicin to carvedilol. Curr. Pharm. Design. 17, 2113–2129 (2011)
Lozano-Sepúlveda, S.A., Rincón-Sanchez, A.R., Rivas-Estilla, A.M.: Antioxidants benefits in hepatitis C infection in the new DAAs era. Ann. Hepatol. 18, 410–415 (2019)
Reshi, M.L., Su, Y.-C., Hong, J.-R.: RNA viruses: ROS-mediated cell death. Int. J. cell. biology 2014 (2014)
Ghosh, M.C., et al.: Therapeutic inhibition of HIF-2α reverses polycythemia and pulmonary hypertension in murine models of human diseases. Blood J. Am. Soc. Hematol. 137, 2509–2519 (2021)

There is NO Competing Interest.

SupplFig1.png
Supplementary Figure 1: Attention heatmap of the cell painting image treated with alfuzosin. The colored solid dots marked cells are increasingly relevant to the attention score in the images with the changes gradually from dark blue to bright pink.
SupplementaryTable.xlsx
rs.pdf
Reporting Summary

Download PDF

Version 1

posted

You are reading this latest preprint version

Attention-based deep learning for accurate cell image analysis

Status:

Version 1

Abstract

Figures

Introduction

Results and Disscussion

Model architecture of X-Profiler

Comparison of distance and angle for high-dimensional features

Multiple cell slices enable accurate grasp of the compound property

X-Profiler can act as an effective cell-slice filter

Model parameter selection

Conclusion and Future directions

Methods

Constructing the cardiotoxicity high-content image data

Constructing the mitochondrial toxicity high-content image data

CellProfiler and DeepProfiler data processing

ROS and MMP Measurement

X-Profiler model Design

X-Profiler Training and Inference

X-Profiler attention map

Cluster and statistical analysis

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1