3.1 Polygon data
The polygon data are stored in the shape file format, and the attributes include the ID number (polyID), the average values of terrain measurements in the polygon (lnSLOPE, lnHAND, TEXTURE, CONVEXITY), the mode value of Sinks, the code number of Noise, and the COMID number of MERIT-Basins. The regional clusters (40, 15) are provided as a separate DBF file. Details of the data are as follows.
1) Properties of the original polygon data (shapefile dataset, Poly_XX (XX: the index number))
polyID: ID number for each polygon
lnSLOPE (calculated as in Table 1): average value
lnHAND (calculated as in Table 1): average value
TEXTURE (calculated as in Table 1): average value
CONVEXITY (calculated as in Table 1): average value
Sinks (calculated as in Table 1): (0) The majority of the area within a polygon is not sinks. (1) The majority of the area within a polygon is sinks.
Noise (manual interpretation; datasets with apparent noise only): (0) There is no apparent noise or ice sheet. (1) There is apparent noise on CONVEXITY. (2) There is apparent noise on TEXTURE and CONVEXITY. (3) Apparent ice sheet. (4) There is apparent noise on TEXTURE. (5) There is apparent noise on lnSLOPE. (6) There is apparent noise on lnSLOPE, TEXTURE, and CONVEXITY. * (0) to (2) are found in a wide range of areas; (3) is found in the polar regions; (4) to (6) are found only in desert areas.
COMID: ID number for each unit catchment of MERIT-Basin (Lin et al., 2019). Polygons with '-99999' are the regions which MERIT-Basin does not cover.
2) Properties of the cluster file (dBASE IV, Cluster_XX)
polyID: ID number for each polygon
ZlnSLOPE: standardized lnSLOPE. Noise areas (2) to (6) are not included.
ZlnHAND: standardized lnHAND. Noise areas (2) to (6) are not included.
Ztexture: standardized TEXTURE. Noise areas (2) to (6) are not included.
Cluster15: Cluster number by k-means clustering of 15 categories using standardized lnSLOPE, lnHAND and TEXTURE with each polygon area as the weight. The cluster number (1–15) is not common to other basins. Noise areas (2) to (6) are not included.
Cluster40: Cluster number by k-means clustering of 40 categories using standardized lnSLOPE, lnHAND and TEXTURE with each polygon area as the weight. The cluster number (1–40) is not common to other basins. Noise areas (2) to (6) are not included.
3) Properties of the cluster convergence values (CSV, XX_convergence_values_cluster15, XX_convergence_values_cluster40)
Convergence values for each clustering of ZlnSLOPE, ZlnHAND, Ztexture.
3.2 Usage of the dataset and notes
The clustering result (CSV) can be joined with the shapefile (Poly_XX) using GIS software with polyID as the key. The result of joining itself is a simple terrain classification map (Fig. 3). As shown in Iwahashi et al. (2021), the 40 clusters can be grouped by comparing them with existing thematic data and scatter plots of clustering convergence values to construct a terrain classification map. In areas where CONVEXITY does not contain noise, it may be possible to use it to reclassify intermediate slopes, such as eroding unconsolidated terraces and alluvial fans, as in Iwahashi et al. (2021).
In continental regions, Sinks (Table 1) extracted by the “Fill Sinks (Wang and Liu)” tool using the 90m DEM cover a wide area of plains. Sinks are limited in the orogenic zone where alluvial fans develop, such as in Japan. It is unclear to what extent Sinks are related to water resources due to differences in climate and geology in different regions, but when combined with areas where HAND is small (blue areas in Fig. 4), they may be useful for environmental assessment. We provide users with both Sinks and lnHAND (Table 1). The lnHAND (= ln (HAND + 1)) can be converted to HAND.
Since the shapefile dataset contains COMID (Lin et al., 2019) as a field attribute, it is considered possible to join attributes of information on the location of upstream and downstream of each catchment area contained in the MERIT-Basins’ river data (Lin et al., 2019) using COMID as a key. The relationship between upstream and downstream in the same unit catchment can probably be determined by the value of lnHAND.
The notes for using the dataset are as follows. The DEM reflects the topography at the time of measurement. Therefore, the classification results are not likely to be the expected ground proxies for man-made altered areas such as reclaimed land or cut-and-fill areas. Since the Noise areas were designated visually, there is a possibility that there may be some oversight or overestimation.
In some areas of brackish lakes and large rivers, the center-line may not be NoData, but narrow polygons due to the OpenStreetMap used to extract these areas. These narrow polygons need to be removed before use. There are often small polygons at the confluences of valleys. These polygons coincide with the blank parts of MERIT-Basins and were caused by the combining of the segmented polygons and MERIT-Basins. Since the areas are very small, they do not have obvious influence on the clustering results.