Study sites. The South China Sea(Randall and Lim 2000) is a semi-enclosed sea that is part of the Pacific Ocean (bordered by Brunei Darussalam, Cambodia, China, Indonesia, Malaysia, Philippines, Singapore, Thailand, and Vietnam and contains numerous small islands(Nguyễn 2004).To obtain a comprehensive dataset about the distribution patterns of parrotfish in the South China Sea, we collected data from 51 sites, including Tioman island in Malaysia, Natuna islands and Anambas islands, Redang island, Nansha islands, Taiping island, Subi reef, Zhongye island, Brunei Darussalam, EI Nido in Philippines, the Vietnam coastal areas (including Con Dao, An Thoi, Cu Lao Cau bay, Nha trang, etc), Cambodia, Koh Tao in Thailand, Xisha islands, Qilianyu, Hainan island, Dongsha islands, Weizhou island, Daya bay, Minjiang river estuary, Jiulong river estuary, Pearl river estuary, Hongkong, Taiwan islands (subdivided into the southern, northern, eastern and western of Taiwan), Kenting National Park, Lanyu, Green island, Ryukyu and South Penghu National Park. All sites were between 99.84 °E and 121.73 °E, and between 2.78 °N and 26.06 °N.
Data collection. Collecting information on the composition and distribution of parrotfish in the South China Sea was through published works, regional checklists, monographs on specific families, scientific reports, and databases to obtain species records of parrotfish (i.e., presence/absence data). In addition, unpublished data from our team were used only in the compilation of species records for Xisha islands and Qilianyu. In the search process of scientific reports, key words were mainly reef fish, parrotfish, distribution, South China Sea and the place names mentioned above. It also searched by country or region in the Fishbase and through the Taiwan fish database (http://fishdb.sinica.edu.tw), from which mainly inquired information about the distribution of parrotfish in Taiwan islands and its surrounding islands. The full data set and detailed list of synonyms were available as a supplementary see Fishbase (https://fishbase.cn/summary/FamilySummary.php?ID=364).
According to jaw morphology, foraging activity and extent of substratum excavation, parrotfish were commonly classified into three main functional groups: browsers, scrapers and excavators(Bonaldo et al. 2014; Kulbicki et al. 2018). The parrotfish of genus Hipposcarus and genus Scarus almost belong to the scrapers; genus Calotomus and genus Leptoscarus almost belong to the browsers; genus Bolbometopon, genus Cetoscarus and genus Chlorurus almost belong to the excavators, respectively(Bellwood and Choat 1990; Ong and Holland 2010).
The environmental factors in this article include geographical location (i.e., latitude and longitude), scleractinian coral species richness, reef area, and sea surface temperature. First of all, latitude and longitude were mostly from wikipedia (http://en.wikipedia.org/). We also consolidated species records of scleractinian coral and reef area from literature, reports and books. Extensive search was conducted using key words such as coral reefs, reef-building corals, marine reserves, area and place names of various research sites in the retrieval process. At the same time, using Reefbase (http://www.reefbase.org/main.aspx) was used to supplement. But reef area of some sites (such as Green island, Lanyu, Hongkong) were unable to access via online data. The sea surface temperature were mainly obtained through the following websites: National Oceanic and Atmospheric Administration (http://www.noaa.gov/), Weather-stats (https://weather-stats.com/seamap), World sea water temperatures (https://seatemperature.info/).
Nestedness analysis. Based on the collected parrotfish data, a nested model was used to explore the distribution pattern of parrotfish in the South China Sea. Because nestedness is not a stable universal structure, it is closely related to the study object (such as category, island habitat type, matrix size, etc.)(Chen and Wang 2004). Firstly, sites with the paucity of published data or not conform to islands habitat types were removed from our analysis, such as eastern Taiwan, southern Taiwan, northern Taiwan, Cambodia, Brunei Darussalam, etc. Finally, 24 sites were eventually selected for nestedness analysis. The same sites were true for the following analysis. At first, a binary code “1/0” was used to show presence/absence of species at various sites. The temperature of the matrix is the disorder degree of the matrix system, which can reflect the deviation degree of the analyzed matrix from the completely nested matrix(Zhang et al. 2008). The lower temperature of the matrix, the higher nestedness degree of the matrix. Thus T ranges from 0 for a completely nested matrix to 100 for one that is completely disordered(Boecklen 1997; Wright et al. 1998). Species nestedness is currently calculated with the nestedness temperature T. We based on the calculation of matrix temperature (matrix temperature) of BINMATNEST (binary matrix nestedness temperature calculator) software to quantify nestedness. "BINMATNEST" will arrange input matrix to maximal packing that the occurrence of speices are as much as possible in the top left corner of the matrix, and calculate the nestedness temperature. At the same time, the null model of the software will randomly generate 1,000 matrices for the significance test of the input matrix. BINMATNEST creates three null models to test the significance of the results, among which null modal 3 has been proved to effectively control the influence of passive sampling(Moore and Swihart 2007; Rodríguez-Gironés and Santamaría 2010). The sequence of sites was calculated by BINMATNEST and the rank of species was sort according to occurrence frequency, maximum body length and morphological characteristics (the ratio of body length and body depth from Fishbase), which were called species nested matrix rank. First of all, in order of occurrence frequency, the species with high frequency was ordered from the top, when the occurrence frequency of species were the same, the maximum body length was used for further ranking, with larger maximum body length first. When the maximum body length was still the same, the species with the largest ratio was priority ordering according to the ratio of body length and body depth. The information of maximum body length and the ratio of body length and body depth were both obtained from Fishbase.
Statistical analyses. Paired sample t-test was used to test whether there was a significant difference in the number of the three functional groups in each site. The effect of environment factors (latitude, longitude, sea surface temperature, Scleractinian coral species richness, reef area) and species life-history traits on forming a nested pattern were evaluated by Spearman’s rank correlation analysis(Schouten et al. 2007; Li et al. 2013). which was conducted between the nested matrix rank of site and environment factors as well as nested matrix rank of species and the maximum body length. According to the nested matrix rank of sites, we divided all sites into two groups and used independent samples t-test to compare whether there was a significant difference between the means values (scraper, excavator, browser, scraper/total, excavator/total, browser/total) of the two groups.
We applied a basic linear models to data from all sites to quantify the relationship between species richness and environmental factors, parrotfish species richness was taken as dependent variables, and environmental factors (scleractinian coral species richness, reef area, sea surface temperature, latitude and longitude) were taken as independent variables.
Principal Component Analysis (PCA) is a powerful techniques of multivariate statistical methods and can replace dataset with a smaller set of independent principal components. First, KMO and Spherical Bartlett tests were performed to analyze the data for the suitability of principal component analysis (PCA)(Zhu et al. 2015). And PCA technique was used to reveal the important component responsible for the distribution characteristics of parrotfish(ALabdeh et al. 2020).
The above data calculation and analyses were performed in IBM SPSS Statistics 26 software. In all analyses involving significance tests, we followed the common view that P < 0.05 means statistically significant differences, P < 0.01 means strongly significant differences and P > 0.05 means non-significant differences.
R was used to draw the map of study region and the distribution diagram of parrotfish species richness. Origin 2018 was used to perform linear regressions or nonlinear fitting of parrotfish species richness with respect to environment factors.