Scanning acoustic microscopy - experiment and statistical-relevant image data. We apply scanning acoustic microscopy to scan the TSV arrays. In particular, we conduct a unique technique utilizing special acoustic lenses with a nominal frequency of 100 MHz and an opening angle of 60°, suitable to fulfill the demands for modern TSV inspection and analysis, see Method section for further details. The methodology represents an extremely time and cost-efficient approach with the ability to collect a statistically relevant amount of TSVs with sufficient resolution for further image post-processing and image quantification. Figure 1A shows exemplarily the projection of the ultrasound data onto the x-y plane to create the so-called C-scan image of a quarter-wafer piece. The region of interest (ROI) of the C-scan contains approximately 10,000 TSVs. Each TSV can be associated with a characteristic pattern generated by defocusing the lens, see Method section. For the better visualization of the characteristic patterns, the region is divided into a C-scan image patch illustrating about 800 TSVs (ROI-1), as shown in Fig. 1B. It can be further subdivided into an ROI (ROI-2) with six TSVs (Fig. 1C). In Fig.1D we present exemplarily two patterns generated by defocusing the acoustic lens and exciting Rayleigh waves30. The two patterns indicate a TSV with and without an inhomogeneity, marked with a ‘chartreuse’ green and a red rectangle, respectively.
Workflow of the End-to-End CNN model approach. Figure 2 illustrates the automated TSV failure analysis workflow with respect to training and testing, based on the extracted C-Scan SAM data. The workflow consists of two sequentially linked CNN architectures, which we appoint as End-to-End Convolutional Neural Network (E2E-CNN). The first CNN (CNN1) is dedicated to localize the TSVs, whereas the second CNN (CNN2) is capable to classify thousands of TSVs, see further details in the method section.
As an input for CNN1, SAM C-scan image data that is not limited by the image size is used. The output provides image patches with characteristic TSV patterns. CNN2 classifies the TSVs according to their quality and utilizes the output of CNN1 as an input. In order to train CNN1, we use two sets of labelled data incorporating C-scan image patches with and without TSVs, respectively. The output of CNN1 detects all the characteristic patterns which are marked in ‘chartreuse’ color for the exemplary ROI with 36 TSV. Whereas the output of CNN2 is color-coded, based on the quality of the TSVs. We train CNN2 with five different classes indicated by class 1 to 5. The different classes are assigned according to patterns found in the C-scan data. Within the exemplary output of CNN2, shown here, 33 TSVs are assigned to class 1, two to class 2 and one TSV to class 3. Class 4 and 5 is not found in the exemplary ROI.
Efficient TSV localization based on a non-sequential sliding window detection for CNN1. There have been several advancements in the field of computer vision for object detections31. Many authors proposed object localization techniques like CNN-based segmentation, sliding window approach and so on32–34. Figure 3A shows an illustration for the sliding window detector processing utilized for the TSV localization. A window with a size of 100 × 100 pixels is chosen to slide over the C-scan image with strides Sx and Sy in x and y directions. This specific window size fits well to cover the characteristic patterns of every TSV. For the training of CNN1, each of these windows is individually fed to locate the TSVs in the SAM C-scan images. For the test images two sets are generated. The first set contains the C-scan images of TSVs in the center of the bounding box. For the second set the image with background and/or any image with TSVs not centred (see supplementary figure S3) are used. Since there are only two categorical features in the dataset for CNN1, by using hot encoding35,36, we assign a binary code ‘1’ to the first set and a binary code ‘0’ to the second set while training.
We conduct a non-sequential sliding window detector approach as illustrated in Fig. 3B, see Method section for further details. A major disadvantage of a sequential approach34, see also Supplementary materials, Fig. S6, is the computational cost as well as the time consumption for the training and testing result. We show that by using a convolutional layer at the end node34 the training time can be reduced from hours to minutes. While the testing procedure takes place, the model predicts multiple bounding boxes37 based on whether the extracted features from the window belong to the first set or the second set, i.e. 1 or 0, respectively. Non-Maximal Suppression (NMS) is applied (see supplementary Fig. S4) to find the predictions with the highest confidence score and gives the best bounding box with a size of 100 × 100 pixels, defining the TSV as an object. The prediction of CNN1 in a C-scan image exemplary with three TSVs is shown in Fig. 3B. Detected TSVs are represented in ‘chartreuse’ colored bounding boxes. The predictions of CNN1 in a larger C-scan image with 864 TSVs is shown in Fig. S5.
Training of the CNN2 and Input for TSV Classification. The second CNN (CNN2) classifies the located TSVs in the SAM C-scan image. Therefore, the inputs of CNN2 are the predictions from CNN1. According to the C-scan data obtained from the SAM characterization, we define five classes for the training datasets, see Fig. 4. The first TSV-class exhibits concentric circular or non-disturbed fringes at the TSV location within the C-SAM image. The second TSV-class, indicate a single inhomogeneity within the circular fringes at various positions along the circumference of the fringes. The third TSV-class represents patterns with multiple inhomogeneities along the fringe circumference. Class 4 and class 5 represent patterns originating from water bubbles and further scanning artefacts e.g. originating from gating error or high scanning speed. Further details with respect to the architecture of the CNN2 and corresponding layer parameters are detailed in supplementary Fig. S7 and Method section.
Validation of the E2E-CNN. Figure 5A and B illustrate the training and testing accuracy for the two CNN models. For the TSV-localization (CNN1) we achieve an accuracy of 100% for the testing and training, that is we are able to detect every TSV from the SAM C-scan image, see Fig.5A. Figure 5B provides an accuracy for CNN2 of over 98% dedicated for the TSV-classification alone with respect to training and testing. Further, to show the performance of the E2E-CNN model, we plot the training and validation loss as a function of epochs for CNN1 and CNN2 in Supplementary Fig. S8.
In Fig. 5C a representative SAM C-scan image displays the fully automated localization and classification of TSVs, exemplarily for class 1, 2 and 3. Images for class 1 to 3 with higher magnification are shown in Fig.5D, indicating the different patterns also shown in Fig.4. For further validation, we compare the SAM C-scan images for class 1, 2 and 3 with correlated SEM characterization results. As indicated by the SEM data in Fig. 5E, no inhomogeneity on the sidewall of the TSV nor at the bottom is detected. This matches with the observation made for the C-scan SAM image where no inhomogeneity in the fringes is exhibited. The SEM data for class 2 shows a large accretion on the bottom of the TSV as well as on the sidewall. Here, a characteristic pattern in the SAM-image indicating a single inhomogeneity within the fringes is shown, see Fig. 5C. For class 3, the SEM data shows a delamination within the sidewall, see Fig. 5E. Here, the C-scan SAM image shows for class 3 a pattern with multiple inhomogeneities in the fringes, see Fig. 5C. According to the correlated SEM data, a clear assignment of the different C-scan patterns can be made.
The detection and quality prediction of 864 TSVs from SAM C-scan images from the wafer is shown in supplementary (Fig. S9).
Comparison of model performance between automatic E2E-CNN and semi-automated ML models. In the following, we compare the developed E2E-CNN model with the semi-automated ML-models. For the semi-automated models, we utilize MLP, Decision Tree and Random Forest, as shown in Table 1. For the semi-automated analysis, to detect the TSVs, it is necessary to apply a geometry based pattern recognition algorithm like the circular Hough transform38,39. The data labelling applied for the training and feature extraction steps are the same for MLP, Decision Tree and Random Forest. For the training of the semi-automated ML model, we define two TSV configurations. The first configuration shows TSVs with non-disturbed fringes and the second one TSVs with disturbed fringes in the SAM C-scan images, see S10.
Figure 6A illustrates the TSV localization for the semi-automated ML analysis. Patches with a size of 100 × 100 pixels showing the characteristic patterns are used, followed by the detection of TSVs using circle Hough transform, see Fig.6A. For the extraction of relevant feature we compare two procedures, namely the Canny Edge Detection (CED)40 and further developed a unique way of segmentation using the Fringe Segmentation Technique (FriST), see further details in Supplementary figures S11, S12 and Method section.
For the binary classification of TSVs, the feature extraction using CED or FriST is followed by the dimensionality reduction using Principal Component Analysis (PCA), see Fig. 6B. By applying PCA, we select the most important features from the output of the CED or FriST as an input to train the model, see Methods for further details. The model performance from all investigated models is summarized in Table 1.
The use of the FriST technique shows for all the three semi-automatic models an improvement in accuracy over the CED technique (see supplementary figure S13) due to the specific extraction of desired features for training.
However, a general disadvantage for the semi-automatic model concerns the requirement for a specific feature extraction to train and test the model. Here, the quality and resolution of the SAM C-scan images is crucial for the subsequent labeling of the pattern associated with the TSV. Therefore, the semi-automated models do not provide an optimal solution when it comes to detecting and classifying large statistics of TSVs, since by increasing the ROI, resolution and contrast will decrease.
Clearly the E2E-CNN workflow, which does not rely on any manual feature extraction technique, outperforms the semi-automated ML-based prediction model performance with respect to testing time and accuracy, as shown in table 1. Indeed, none of the semi-automated models reaches an accuracy of over 90 % and testing times below 10 min.
Table 1. Comparison of TSV classification performance using E2E-CNN and semi-automated TSV classification models
Comparison between the semi-automated ML models (MLP, DT & RF) and the developed fully automated E2E-CNN model for TSV classification from a SAM C-scan image data with 96 TSVs for validation. For the necessary feature extraction of the semi-automated model, we use the CED and the FriST techniques.
Statistical analysis of SAM C-scan images obtained from the E2E-CNN model. In the following, we utilize the developed E2E-CNN model to highlight the statistical possibilities for the failure analysis. Figure 7a illustrates the defect map from the state-of-the-art automatic optical microscopy (AOM) inspection. The black dots mark in Fig. 7A are the position of probable defect locations. For comparison with the SAM measurements and incorporated E2E-CNN model we select four ROIs, labeled with A, B, C and D at distinct wafer locations. We evaluate the statistics with respect to the occurrence of TSVs with and without inhomogeneities. Each individual ROI selected for the underlying analysis consist of 576 TSVs. Figures 7B and C illustrate exemplarily the defect map obtained from the SAM image data and subsequent E2E-CNN analysis for ROI D and a further magnification for ROI D-1. The latter indicates the TSV location as well as TSV classification according to class 1 - 5. Further C-scan with prediction results based on the E2E-CNN model for ROI A, ROI B and ROI C are shown in Supplementary Material S14.
Notably, according to Fig.7D the extracted statistics illustrates a similar trend for the AOM- and the SAM-based method. ROI C shows for both approaches the highest count with inhomogeneous TSVs and ROI D indicates the lowest one. The depicted results are summarized in Table 2. However, the SAM-based inspection utilizing the E2E-CNN model depicts a higher number of TSVs with inhomogeneities than the optical inspection.
Further, we provide in Fig. 7E statistical information with respect to different classes predicted by the E2E-CNN model for ROI A-D. Indeed, the model shows a high count of detected inhomogeneities with class 2. Class 2 as indicated by Fig. 5 shows a mixture of sidewall and bottom defects. Further, it displays the class with the highest counts. We argue that the observed difference between AOM and SAM is explainable due to the restricted ability of the optical inspection to detect inhomogeneities at bottom and not in the sidewall. The results indicate that there is a higher amount of sidewall than bottom inhomogeneities present in the TSV array.
Table 2. Statistics showing the different classes 1 to 5 extracted from the E2E-CNN model for ROI A, ROI B, ROI C and ROI D in comparison with the AOM inspection result.