Combining quantitative analysis with an elliptic Fourier descriptor: A study of pottery from the Gansu-Zhanqi site based on 3D scanning and computer technology

: Pottery is an important material in archaeological studies, and the accurate classification of pottery shapes largely depends on the experience and knowledge of archaeologists. In this thesis, pottery taken from the Gansu-Zhanqi site is used for sampling. Three-dimensional (3D) models of the pottery were obtained using 3D scanning, and a computer-assisted pottery typology was studied through quantitative analysis and elliptic Fourier descriptor. This method, which can enhance and supplement the traditional methods of classifying pottery in archaeology and thereby enrich the parameters and breadth of pottery analysis, represents a new means for exploring and experimenting with objective classification and provides a new tool for traditional archaeological analysis methods.


Introduction
Pottery is a common and important ancient material in the study of archaeology, and the classification pattern of each object is an important factor used to determine the type and age of an archaeological culture [1]. Over the past few decades, some archaeologists have utilized mathematical methods such as seriation [2], principal component analysis [3], cluster analysis [4,5], discriminant analysis [6], multivariate statistics [7], correspondence analysis [8], and principal coordinate analysis [9]. Notably, Chinese archaeologists have used cluster analysis, which is a multivariate statistical method, to classify pottery ware [10].
Traditionally, these mathematical methods have been used alone. The data have been measured manually. In this paper, we use a computer program to extract data from three-dimensional (3D) pottery models and use the data for quantitative analysis and elliptical Fourier descriptors. As quantitative analysis, Chinese archaeologists have used many times [11,12,13]. It has been verified to be correct according to traditional archaeological research. As elliptic Fourier descriptors (EFDs), originally proposed by Kuhl and Giardina [14], can delineate any type of shape with a closed two-dimensional (2D) contour. Over the past several years, researchers have applied elliptical Fourier descriptors to various fields, such as agriculture [15], the human body [16,17], and ecology [18], but it has rarely been used in archaeological research. This study, however, combines the results of quantitative analysis and elliptical Fourier descriptors to study pottery typology. The final analysis provides additional parameters, which make the final results more accurate.
In this article, we use unearthed pottery from the Gansu-Zhanqi burial site as examples. The Zhanqi site is a part of the Siwa cultural relics (14 BC to 11 BC) along the Tao River in Min County, Gansu Province. 66 tombs were found at the site, and an additional 20 sites were found that included houses, cooking pits, ash pits and sacrificial remains. The unearthed objects included pottery, bronze ware, stone tools, bone artifacts and ornaments. The layers and times of the tombs are clear, and unearthed artifacts are abundant. The Gansu-Zhanqi site is of great significance for the further study of the Siwa culture as considerable amounts of pottery were unearthed at the site [19] (Fig. 1), including jars, lis, basins, dous, and bottles, albeit the majority of the pottery consists of jars. According to the shape of the mouth, the jars are divided into flat and saddle jars [20]. In this article, 62 saddle jar examples, which are representative pottery from the Gansu-Zhanqi site, were scanned.
First, it is important to note that the pottery model, which is created using 3D scanning, can be used to create a 3D virtual display, print a 3D model, perform a virtual restoration of cultural relics, and record 3D data. Moreover, a 3D pottery model is permanent and can be used multiple times as a basic model for further research. This article is only one part of this research, however. Second, a quantitative analysis of pottery has been applied to many archaeological studies. Compared with traditional archaeological methods, the cluster analysis results were basically identical to the results provided by the original reports, and the results of this study have been confirmed in the Tianma-Qucun site [13]. While contours can be analyzed in a variety of ways, the elliptic Fourier descriptors is a highly suitable method for this paper. The pottery of the Gansu-Zhanqi site have not been analyzed through traditional archaeological typology, the results of this pottery classification are the first to be published for this site.  [19] 1. Saddle jar 2.Flat jar 3.Colored pottery 4. Dou 5. Bottle 6. Bottle 7. Flat jar 8. Bottle 9.Li* 10.Li* *Li is a ceramic cooking vessel used in ancient China. It appeared in the late Neolithic Age and continued to be popular in the Shang and Zhou dynasties. Most have an open mouth, a round abdomen and three baggy feet.

Unifying the Data Extracted from the Pottery Model
The 3D models were acquired using a Creaform Go!SCAN 20 scanner [21]. The accuracy is 0.1 mm, and the resolution is 0.2 mm. Compared with a 3D laser scanner, this scanner uses a white light scanner with an LED light source and a fast scanning speed. Therefor this scanner is more suitable for small cultural relics. VXelements [22], Autodesk Meshmixer [23], and 3D Builder [24] are used for the postprocessing stage of the 3D-scanning process (Fig. 2). It is important to note that since the research on pottery shapes requires less accuracy than does a virtual display, the model analysis is not affected by the software processes in this paper. The data are extracted from the 3D pottery model and include two kinds of data. While the lengths are used for quantitative analysis, the contours are used for elliptic Fourier descriptors. Over the past few years, scientists have introduced several data extraction methods for 3D pottery models, including Spelitz [25], Rasheed [26], Măruţoiu [27], and Angelo [28]. In this paper, the data are extracted using a computer program in C language (Fig. 3). It is further noted that the jars at the Gansu-Zhanqi site are binaural, i.e., have two ears, which is a representative characteristic of pottery from northwest China. Considering the methods and functions of this article, the extraction and analysis of the lengths are based on the XZ plane in the model, which shows the different calibers and heights of the ears, whereas the extraction and analysis of the contours are based on the YZ plane in the model, which shows variations in the overall shapes.

A. Quantitative analysis
Step 1. The first step is to obtain the original data. The measurement areas are presented in Fig. 4. To a certain extent, these areas represent the characteristics of the jar, which are the data used for the quantitative analysis. Step 2. The second step involves using the original data and selecting a different proportion to represent a feature of the pottery ware.
62 saddle jars were analyzed by examining the overall structure of the pottery model, analyzing the characteristic lengths, and selecting meaningful ratios for the following: a. Abdominal diameter/height (the highest), which represents the overall shape of the object; b. Height of the abdominal diameter/height (the highest), which represents the position of the abdominal diameter; c. Height of the ear (b-g)/height (the highest), which is the ratio of the ear height to the overall shape; and d. Height (the shortest)/height (the highest), which is the degree of the saddle mouth.
Step 3. The third step is to conduct a quantitative analysis using the ratios obtained in the second step. In this paper, we use the multivariate statistical method of cluster analysis [29]. Cluster analysis is a branch of multivariate analysis which is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). The essence of cluster analysis is to collect a sample of similar variables. It is one of important techniques in data mining and exploratory data analysis. The similarity distance between samples was calculated using Euclidean square distance. Euclidean distance is used to calculate the similarity distance between samples which has been used well in similar research [11,12]. A similarity matrix is formed by calculating the distance between each sample. In the clustering process, the between-class distance is determined by the single linkage method, the complete linkage method, the median method, the average linkage method, the Ward's method and so on. The Ward's method is used in this paper. The similarity distance and the Ward's method were performed using SPSS software.
According to the number and characteristics of the samples, the number of clusters is determined. In this paper, The number of clusters is based on published studies which have similar pottery [11,12,13]. It has been verified to be correct according to traditional archaeological research. For jars, the number of classifications is 9. The cluster analysis results and dendrogram are shown in Table S1 and Figure S1.
By observing all kinds of objects, classes with similar characteristics are combined into groups. In the clustering results, a few kinds of objects cannot be combined with other classes, and thus, they are regarded as special individuals. The results are as follows.
The group 1 consists of classes 1, 3, 8, and 9 with the following main characteristics: the overall shape is similar to a square, the length of the ear is a third of the height, and the degree of the saddle mouth is obvious.
The group 2 consists of classes 2 and 5 with the following main characteristics: the overall shape is lanky and square, the length of the ear is a third of the height, and the degree of the saddle mouth is obvious.
The group 3 consists of classes 4, 6 and 7 with the following main characteristics: the overall shape is lanky, the length of the ear is a third of the height, and the degree of the saddle mouth is not obvious.

B. Elliptic Fourier descriptors
EFDs, originally proposed by Kuhl and Giardina [8], delineate any type of shape with a closed 2D contour, and the contours are then analyzed using the EFDs. The steps are as follows. In this paper, the calculations are achieved using SHAPE [30].
Step 1. The maximum number of harmonics is determined by the elliptical Fourier series formula. The EFD of the 2D contour is calculated based on the maximum number of harmonics. The procedure of the elliptical Fourier series approximation involves the transformation of the (x, y) coordinate points on the curve in two dimensions into a pair of equations written as a function of a third variable (t). The Fourier coefficients are then calculated based on a discrete Fourier series approximation of the chain-code boundary contours. The elliptical Fourier series approximation of a closed contour projected on the x-and y-axes is defined as follows: where t is the steps required to move a unit pixel along the closed contour such that −1 ＜t＜ for 1≤ p≤ K; N is the number of Fourier harmonics; and K is the total number of chain-coded points. 0 and 0 are coefficients corresponding to frequency 0 and define the mean size of the contour. If the contour between the (i-1)-th and the i-th chain-coded points is linearly interpolated and the length of the contour from the starting point to the p-th point and the perimeter of the contour are denoted by and T, respectively, then T is the basic period of the chain code, which is the overall step used to traverse the entire contour; T= tk, where △ is the distance between the (i-1)th and i-th points. The Kth point is denoted by . Then, where △ and △ are the distances along the x-and y-axes between the (i-1)th and i-th points, respectively. Assuming linear interpolation between the neighboring points, the EFDs in Eq. (1) of the n-th harmonic (an, bn, cn and dn) can be calculated using the following equations: The number of harmonics required is estimated using the average Fourier power spectrum. The Fourier power of a harmonic is proportional to the amplitude and provides a measure of the amount of shape information as described by the following equation.
In this case, the Fourier harmonics are truncated at the value N=14, at which point the average cumulative power is 90% or more of the total average power. This is calculated as Nmax, which is equal to half the number of boundary points. The boundary contour detection and the Fourier series approximation with a Fourier power of 90% is presented in Fig. 5. According to the maximum harmonic numbers N=14, the Elliptic Fourier coefficients were determined using Eqs. (1-4), presented in Table S2.

Fig. 5. Contour lines of the jar approximations with different maximum numbers of harmonics
Step 2. Principal component analysis. The principal component analysis is calculated according to the elliptical Fourier coefficients of the pottery model. Based on the extraction method of the EFD, this analysis yields a vector for the EFD such that when Nmax=14, the vector is [a1, b1, c1, d1, ……a20, b20, c20, d20]. All the samples of the EFD vector can be expressed as Gi = [ai1, ai2, ai3, ai4, ……… ain, bin, cin, din], where i is the number of samples, i=1, 2, …….p, and n is the maximum number of harmonics. The matrix of the Fourier descriptor is expressed as The principal component analysis using Eq. 6 is calculated as the cumulative contribution rate of the principal component and the component score coefficient matrix. When the cumulative contribution rate of the principal component reaches 90%, the first principal component can represent the characteristics of the contour lines and thereby reduce dimensionality. Finally, the principal component scores (Table S3) are calculated based on the maximum number of harmonics (Nmax=14) and the cumulative contribution rate of the principal components (Fig. S2).
Step 3. K-means clustering. Based on the principal component scores, the pottery basin shapes are classified using the k-means clustering method. The Euclidean distance is usually selected as the similarity measure in the k-means clustering algorithm, which usually relates to all attributes. The principal component scores are then processed in the k-means clustering analysis using SPSS software. According to the k-means clustering results (Table S4), the main features of all kinds of objects are as follows.
The class △ with the following main characteristics: The overall shape is dumpy with a round shoulder, and the degree of decrease is not obvious from the abdominal diameter to the bottom diameter.
The class △ with the following main characteristics: The overall shape closely resembles a square with an angular shoulder, and the degree of decrease is obvious from the abdominal diameter to the bottom diameter.
The class △ with the following main characteristics: The overall shape is lanky with an angular shoulder, and the degree of decrease is obvious from the abdominal diameter to the bottom diameter.
The class △ with the following main characteristics: The overall shape is lanky with a round shoulder, and the degree of decrease is not obvious from the abdominal diameter to the bottom diameter.

Results and Discussion
Based on quantitative analysis and Elliptic Fourier descriptors, the different characteristics of jars are presented in the classification results. In the clustering analysis of the ratios, the overall shape and the degree of the saddle mouth are obvious, while the difference in the length of the ear is not obvious. In the k-means clustering, the overall shape, the curve of the shoulders and the shape of the abdomen are obvious. Based on the above factors, the jars are classified according to four parameters, specifically, the overall shape, the shoulder shape, the lower abdomen shape and the mouth shape. Each parameter is composed of two features, as follows: in terms of the overall shape, the jars are divided into square bodies (height ≈ abdominal diameter) or thin bodies (height > abdominal diameter). In terms of the shoulder shape, the jars are divided into round shoulders (the curvature of the shoulders changes more slowly) or folded shoulders (the curvature of the shoulders changes quickly). With respect to the shape of the lower abdomen, the contraction of the lower abdomen is not obvious, i.e., the abdominal diameter is slightly larger than the bottom diameter, or the contraction of the lower abdomen is obvious, i.e., the abdominal diameter is considerably larger than the bottom diameter. With respect to the shape of the mouth, the change in the saddle mouth may be small, i.e., the highest height is slightly higher than the shortest height, or the change in the saddle mouth can be large, i.e., the highest height is considerably higher than the shortest height.
For the four parameters, the mouth shape is analyzed according to the clustering analysis of the ratios, the shoulder shape and the lower abdomen are analyzed according to the k-means clustering. The three parameters are independent of each other. The overall shape is analyzed according to the clustering analysis of the ratios and the k-means clustering. The classification results can be compared with the results of the clustering analysis of the ratios and the k-means clustering. For example, in the clustering analysis of the ratios, the group 2 consists of classes 2 and 5 with the following main characteristics: the overall shape is lanky and square. In the k-means clustering, the class △ is square , the class △ is lanky. In the table 1, G32:2 is Classes (HCA) 2 and Classes (k-means) △ which is belong to square body. G34:5 is Classes (HCA) 2 and Classes (k-means) ④ which is belong to thin body. It's important to note that a few kinds of objects cannot be combined with other classes, and thus, they are regarded as special individuals. The final classification result is mutually determined by the clustering analysis and k-means clustering ( Table 1). Following a comprehensive comparison of the classification results, the overall shape of the sample is divided into square bodies and thin bodies. Square bodies are included in groups 1 (classes 1, 3, 8, and 9) and 2 (classes 2 and 5) in the clustering analysis of the ratios and classes ① and ② in the k-means clustering. Thin bodies are included in groups 2 (classes 2 and 5) and 3 (classes 4, 6 and 7) in the clustering analysis of the ratios and classes ③ and ④ in the k-means clustering. In terms of the shoulder shape, round shoulders are included classes ① and ④ in the k-means clustering and folded shoulders are included in classes ② and ③ in the k-means clustering. Regarding the shape of the lower abdomen, when the contraction of the lower abdomen is not obvious, the jars are classified into classes ① and ④ in the k-means clustering, and when the contraction of the lower abdomen is obvious, the jars are classified into classes ② and ③ in the k-means clustering. In terms of the shape of the mouth, when the change in the saddle mouth is small, the jars are classified into group 3 (classes 4, 6 and 7) in the clustering analysis of the ratios, and when the change in the saddle mouth is large, the jars are classified into groups 1 (classes 1, 3, 8, and 9) and 2 (classes 2 and 5) in the clustering analysis of the ratios. Individual or small sample classes are incorporated into similar classes according to their morphological characteristics.
There are four parameters, and each parameter is composed of two features, the final number of classes is 16 (C 2 1 C 2 1 C 2 1 C 2 1 ). The Roman numerals denote the number of classes formed by the characteristic parameters (Fig. S3). According to the set parameters and the two-step classification, the final classification results are as follows ( Table 2). It should be noted that in Table 2 6).

Conclusions
With the development of science and technology, 3D-scanning technology is being more widely used in archaeological research. Researchers can choose a 3D scanner and postmodel analysis software according to their own research needs. In addition to the research in this paper, a pottery model can be used to create a 3D virtual display, print a 3D model, perform the virtual restoration of cultural relics, and record 3D data. A 3D pottery model is permanent and can be used multiple times as a basic model for further research.
Due to the mature application of 3D scanning technology, researchers now pay more attention to research content based on 3D scanning and the scientific problems that can be solved by 3D models. In this paper, the focus of our research is to convey a research concept, specifically, a new exploration and a new attempt to establish an objective classification method for pottery. As such this study provides a new auxiliary tool for traditional archaeological typology research. The computer program in this paper is programmed according to the authors' existing computing power. Researchers can use a variety of methods to extract data based on their computing capabilities. Computer programs are faster and more accurate, and they can avoid multiple touches that are required for manual measurements. The possibility that 3D scanners will offer a renewal and development function in the future means that additional pottery features can be scanned and recorded, such as decorative designs and thicknesses, a capability that increases the number of characteristic factors available for pottery classification.
Traditionally, the accurate classification of pottery shapes largely depended on the experience and knowledge of the archaeologists. The mathematical method, however, is an objective method that differs from the methods used in traditional archaeology. It is a tool used to illustrate the relationship between objects, and it enables us to study typology more objectively while avoiding the influence of subjective factors introduced by different researchers. As the pottery of the Gansu-Zhanqi site has not been analyzed through traditional archaeological methods, the results of this pottery typology are the first to be published for the site. Therefore, the results of our study provide a reference for traditional archaeology, and they contribute to the study of the combination of quantitative and qualitative research in archaeological typology. However, it should be noted that although this method can explain the results of the analysis, it cannot furnish conclusions for archaeological research. Over the past few years, researchers have applied mathematical methods to various areas of traditional archaeology. However, the combination of the two mathematical methods is relatively rare. In this paper, we analyze 2D data and combine them with 1Ddata. In the present paper, we analyze the overall 3D data of the pottery model through mathematical methods. We hope that these results will form a series that can provide a reference for traditional archaeological research, thereby enhancing the accuracy and value of the results.  [19] 1.Saddle jar 2.Flat jar 3.Colored pottery 4. Dou 5. Bottle 6.Bottle 7. Flat jar 8. Bottle 9. Li* 10.Li* (*Li is a ceramic cooking vessel used in ancient China. It appeared in the late Neolithic Age and continued to be popular in the Shang and Zhou dynasties.

Figure legends
Most have an open mouth, a round abdomen and three baggy feet). Fig. 2. Postprocessing the scanning data. a. Deficiencies are detected, such as holes and cracks, using the Autodesk Meshmixer b. The model is complete after being restored in Autodesk Meshmixer c. The lengths are based on the XZ plane in the 3D Builder. d. The contours are based on the YZ plane in the 3D Builder. Fig. 3. Data acquisition process in the C programming language a. The point cloud data are read. b. The contour is generated. c. The contour points are generated according to the calculation. d. The contour line is enlarged. e. The plane is formed according to the contour points. f. The contour lines of the pottery model are the output.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.