Perceptions and Expressions Pertaining to Cultural Ecosystem Services in Urban Green Spaces Using Text Mining Techniques

Objectives: To distinguish between the differences in cultural services based on the type of urban green area, through atypical expressions. Context: Urban green spaces provide important ecosystem services, with cultural ecosystem services (CES) playing a signicant role in citizens’ lives. Nevertheless, these are often undervalued as it is dicult to quantitatively evaluate the characteristics of an individuals’ subjective perception of urban space. By examining social media content, we can analyze the content created by users and grasp demand values. Methods: This study analyzed urban green areas in the inland of Ansan city in Gyeonggi-do, South Korea. Data were collected twice, on October 3, 2017 and October 4, 2018, to verify that the extracted keywords were representative. We extracted keywords from blog posts related to CES and evaluated the possibility of using them as quantitative indicators. Results: The results indicate that the perceived expression words were different depending on the type of green space. Certain CES such as “exhibit” and “climbing” are affected by green space type. However, it was dicult to identify emotional responses to CES. We found that some words contained double meanings, which made it dicult to evaluate individuals’ perceptions of CES based on the frequency of specic words. Conclusions: This study demonstrates that social media data on CES greatly extends the type and, especially, the volume and scale of information derived from traditional survey methods. The signicance of this study lies in its attempt to quantitatively evaluate the recognition of CES in daily life.


Introduction
In urban green spaces, ecosystems not only ful ll the ecological function of sustaining the urban environment but also provide an important living area for urban residents. The concept of ecosystem services conceptualizes human environmental interactions through a series of linked components that relate ecological processes to human well-being ( Socio-cultural approaches are useful for exploring user perception and preferences for CES. Numerous methods have been used to evaluate CES, including questionnaires, photographic analysis, and visitoremployed photography with short comments, with the most common method being the perception survey method, which has temporal and spatial limitations Leetaru et al. 2013). This method is most suitable for small regional-scale studies, however, it has limitations in large-scale regional studies (Bragagnolo et  Text, which is unstructured data, accounts for 80% of the data in the world. In this study, we examined the relationship between the meanings of words contained in social media posts through natural language processing, to understand user perceptions of CES provided by urban green spaces. To do this, we rst extracted keywords representing CES through text network analysis and identi ed the characteristics of CES for each type of urban green space. This was followed by analyzing the relationship between the keywords and greenery types derived from text mining. We then aimed to determine the possibility of distinguishing between differences in cultural services based on the type of urban green area, through atypical expressions.

Study Site
This study analyzed urban green areas in the inland of Ansan city in Gyeonggi-do, South Korea. Ansan was the country's rst planned city. A 1977 development plan turned Ansan into an industrial eco-friendly city by separating the industrial area from residential and commercial zones. With a total area of 154.23 km 2 and a population of 740,000, Ansan became what is commonly classi ed as a mid-sized city in 2017. Ansan is bordered by the Yellow Sea to the west and mountains to the east. The areas designated for agricultural and industrial activities are concentrated in the city's periphery. The inland area is 109.34 km 2 and is mostly smooth atland, except for mountainous regions. The city includes several types of urban green spaces integrated into historical and cultural sites (Fig. 1).
As of 2017, the urban forest area per citizen was 9.02 km 2 , which is relatively high compared to other municipalities in Korea. The parks are quite evenly distributed due to the planned nature of the city, and green spaces are all equally accessible, making Ansan suitable for this initial big data study.

Data Collection and Re ne
This study collected and evaluated online text data regarding users' perceptions of urban green spaces in the inland area of Ansan city. The text data were collected from public blogs obtained through a widely used Korean search engine, Naver (www.naver.com). Referred to as "the Google of South Korea", Naver handled 74.7% of all web searches in the country and had 42 million enrolled users as of September 2017, (https://dbpedia.org/page/Naver). It has a blog app that enables users to easily publish usergenerated content, including posts, photos, product reviews, or location information. A blog contains individual daily information based on experiences, which is then shared and spread to a large number of users through a portal site. Therefore, it can be used as a tool to understand the user's individual experience.
The data were extracted using the Python crawler through the API, which followed the order of the portal site's engine algorithm and top-ranked content in order of relevance. We only collected data that were made public by the user.
We collected data twice, on October 3, 2017 and October 4, 2018, to verify that the extracted keywords were representative. The datasets were then manually ltered using Python and Excel to retain only the number of searches related to the space's use was insu cient due to advertisements unrelated to the actual use of the space or lack of content on the topic were exempt from preliminary surveys. A total of 17 search keywords related to green spaces were found, including three for forests, two for rivers, eight for parks, and four for greenery. Fig. 2 presents this study's work ow.

Semantic Analysis
The blog used in this study contains information written in users' natural language. Natural language has a lexical ambiguity in which one word has various meanings (Agirre and Edmonds 2007). As the true meaning of natural language is information contained in the everyday language of those who speak it, we must examine frequently used words, the order in which they are used, and which words appear together. Unstructured atypical text analysis requires an understanding of natural language processing and semantic complexity (Loebner 2002). A social network analysis is based on graph theory, which posits that a system of connected elements can be de ned as a network (Freeman 1979). On the network, keywords are de ned as nodes, and relationships between keywords are de ned as links. The semantic analysis was performed using Net-miner 4.4, a social network analysis program (Fig. 3).
The collected data were re ned via repetitive preprocessing of unstructured text data. Although the characteristics of long texts differ depending on the author, in general, various keywords are combined to describe a single topic. Moreover, although detailed opinions can be found if all words are examined, words with a low frequency tend to be less useful (Luhn 1958). To clearly examine the relationship between keywords, it is necessary to simplify the sentences. Thus, unnecessary words were re ned through a stop-word dictionary to simplify the relationship between terms. For example, "then," "now," "so to speak," "to summarize," etc. are stop words. In this study, we excluded words such as "got" ("some space" in Korean) and "e-got" ("here" in Korean), too. We examined the top keywords that were common nouns, excluding the names of administrative districts, search terms, and proper nouns.
Centrality and community analyses were conducted, and representative keywords were extracted through a semantic analysis of the collected data. A total of 530 words/phrases were extracted from both sets of data (Table 1). Each type of data was re ned by the same preprocessing, and the analysis was performed with respect to the parent node for visualization due to a large number of nodes. In this process, as the frequency and centrality value were too high, the network was tilted to one side in a node, making it impossible to analyze all types in the same manner. As such, high value top-ranked nodes that connected to each node were removed and analyzed. The removal process was conducted while sequentially removing the nodes from the top-ranked node and was based on the point at which the cluster was scattered. To analyze the keyword meaning, we examined the content of the text with the word connected to the ego-networks analysis for the same preprocessed node. We focused on the use of degree centrality and eigenvector centrality to interpret the structure of the networks. A network analysis provides insights into the system properties and identi es critical nodes with high centrality (Roth and Cointet 2010; Topirceanu et al. 2018). As the centrality value of a word derived through centrality analysis goes through a standardization process, its importance can be compared on networks.
Next, we re ned the network using the same conditions because the number of search keywords differed by type. For "Forest," 51 words were extracted by analyzing the components, excluding the top 5 (excluded keywords included: Ansansi, climbing) connected with 137 words extracted after the preprocessing re nement. We extracted 27 nodes following the text analysis of "Rivers," excluding the top 3 (excluded keywords included: Ansansi) that were connected to the 66 words extracted after re nement. For "Park," we analyzed the components, excluding the top 10 (excluded keywords included: Park, Nojeokbong, Citizen, Ansansi, Gyeonggido) of the 488 words extracted after re nement. For "Park," eight search keywords were used in the collection. A cohesion analysis was performed several times around the ancestor node, and the last 44 words were extracted to visualize and contain content. Regarding "Greenery" in public cultural institutions, a component analysis was performed on all nodes except the top 8 (excluded keywords: culture, exhibition, Ansansi, Gyeonggido) out of the 256 words extracted. After the pre-processing and the result that most nodes were connected, 51 words were nally extracted. All the words used in this study were translated using the Korean-English standard dictionary (https://en.dict.naver.com/#/main), and Korean was written together in parentheses.

Expression Keyword of CES
In contrast to the access of words according to a prede ned typology, we wanted to identify the type of word attribute without limiting CES meaning. The representative words were created by classifying the attributes related to place, activity, object, and image based on the basic meaning of the keywords. This approach was inspired by three components (forms, practices, relationship) from Stephenson (2008) and Bieling et al. (2014).
A library was built by creating a taxonomy of expression words related to cultural services mentioned in all types and clustering similar content by word attributes. The classi cation resulted in 39 sub-categories by grouping keywords with similar attributes in a bottom-up. By regrouping keywords with similar properties, 12 intermediate classi cations were derived, broadly classi ed into place, activity, object, and image. We classi ed words as actions if they became verbs by adding "do." If the word meant a visible object or subject, it was grouped as an object, the same goes for space and words expressing emotions. For example, seeing "autumn leaves" is a recreational and aesthetic CES value; "autumn leaves" is the object, "visit" is the activity. Regarding the phrase "along the riverside bicycle path;" the activity of enjoying riding is for recreation and health promotion, the keyword "bicycle path" is the place and "riding" is the activity. As this also includes content beyond the scope of CES, only the main keywords solely relevant to CES were examined.
Based on this, the relationship between CES and the semantic structure of expression was analyzed. First, we examined the relationship between keywords and CES by type of urban green space. Second, we looked at cultural services that appeared regardless of the type of green space. To simplify complex networks, we transformed the network to one mode and implemented by leaving only important links with high similarity (weight) through the path nder network. Then, links with low similarity were deleted and only keywords with high similarity remained. Through iterative work, we extracted word pairs with high links, co-concurrence word relationships regardless of the speci c type. Many words of activity, object, and place properties were connected to one central keyword to form a word network. Third, MDS and correlation analysis were conducted on the overlap of cultural services between green space types. To prevent duplication of CES as much as possible, we performed MDS analysis only activity and emotion in library of CES. Moreover, we analyzed the correlations between type categories by calculating the Pearson correlation coe cient between each pair of CES word and link.

Perception in CES by Green Space Types
The analysis results for major keywords by ecosystem type are shown in Table 2. We found differences in the main content, expressive vocabulary, CES-related activities, main space, and usage pattern through the analysis of content by green space type.
Forests were highly associated with health and aesthetic values centered around climbing. The top 30 words were related to mountaineering such as "climbing, "descending," "traversing," and "top of a mountain." It was possible to nd the places where cultural services were supplied by word of places such as "entrances," "forest baths," "Dullegil," "octagonal pavilion," "stairs," and "shelters connected around the course." In addition, we understood perceptions of aesthetic value through words such as "picture," "photos," and "views," and perceptions of spiritual value through words such as "leisure" and "relaxation." Rivers were associated with high health values centered on "biking" and "walking." Unlike other types of urban greenery, words such as "tulips" and "maple" also appeared and connected the festival. These word pairs include recreational, heritage, and aesthetic value meanings rather than a single cultural service.
Parks were more highly associated with recreation activities than the other types of green spaces. "Strolling" was the most frequent and representative activity of the park. In addition, cherry blossom viewing, water viewing, and biking were also major recreational activities of CES.
The keywords for activities such as "exhibition," "holding," "participation," and "artist" are connected with the keywords "exhibitions," "events," "works," and "sculptures," so that it has the meaning of cultural heritage. Aesthetic value can also be con rmed through the connection between the activity keywords of "photography," "snapping," and "appreciation" and picture and object keywords of "cherry blossoms," "autumn leaves," "roses," "tulips," "landscapes," and "trees," which are linked mainly to festivals. In addition, "experience" has educational value meaning as it is connected to "program," "play," "insect," "ecology," "nature" and "green," which are the keywords of experiential activity. Although there were differences depending on the parks' character and facilities, it can be seen that various cultural services are being serviced.
Greenery in PCEF exhibited distinct spatial characteristics that were generally associated with values in cultural heritage and education through the activity words such as art, experience, and drawing. However, it is possible to grasp the recreational value and the meaning of recreation through the words "outdoor greenery-sculpture," "playground-experience-garden," "lake," "walk," "date," and "outing." The result of the path nder network analysis showed that the overlapping of linked words con rms the possibility of same CES bene ts in different types of ecosystems (Fig. 3). There are 4 words with high centrality: way, strolling, exhibit, and climbing. The words, way and strolling, are closely connected with each other, while exhibit and climbing are connected with the mediating words, house and course, respectively, forming separate clusters. The analysis was conducted without considering the type of green space, but exhibit and climbing activities constitute a separate network. A speci c of CES can be identi ed through a word associated with only one type (Forest) without other connection node, such as climbing. In other words, we can guess what kind of greenery activity is possible without spatial data merely from the perceived word expression. Our results indicate the expression of perceived CES in different greenspace type as mentioned in our previous studies (Ko and Son 2018).
In the process of network reduction, the keywords with low frequency and link values were removed, leaving only "think" and "relaxation" among emotion word of CES. The word mentioned here "way" can also be an ambiguous expression meaning a way (how to do, where to go for life) such as how to relax rather than a simple "road." This result means that simple words (e.g., social media tags) should be used with caution as CES indicators.

Relationship between Keywords And CES
The MDS analysis showed that the explanatory power was 67.4% (Fig. 4). The x-axis represents the degree of activity, ranging from passive participation to positive activity. The y-axis represents the degree of activity, ranging from the dynamic activity in nature to the static activity in an arti cial space. Activities that can be done individually and together were separated. Use of established facilities and active participation or physical movement has a certain direction.
At the origin, multiple keywords of social relationships and recreation were superimposed on the same coordinates. The word nested such as inline, skate, gatherings, and outing, ower viewing in the rst quadrant (x>0, y>0), and concerts, festivals, recital, drawing, learning, ecotourism in the fourth quadrant (x>0, y<0). Forests and greenery in PCEF can identify the types of green spaces by activity, whereas rivers and parks are mixed and not easy to distinguish clearly. Many words derived from rivers overlap with parks and have similar CES, this is presumed because the park and the river have spatial continuity due to the connection of the promenade and the bicycle path.
Correlation analysis can help in quantitatively checking the differences by type of green spaces (Table 3). We found that the Park category was signi cantly correlated with other type and had a relatively higher correlation value compared to other categories such as the Rivers, the Greenery in PCEF, and the Forest.
However, the correlation between the Park and the Forest was not signi cant in the link of type analysis, but it was signi cant in the node of type analysis. There was no signi cant correlation between other green types except Park.

The problem of duplicate calculation of CES
In this study, we applied an inductive free-listing approach to explore people's subjective perception of CES, focusing on the words with which people express their experiences of using urban greenery Fagerholm et al. 2012). Although this research found that various words used to express opinions about CES can be classi ed based on the meaning of a speci c keyword, we found that one keyword can overlap with one or more CES and have multiple meanings. For example, " ower viewing," "outing," and other physical activities can overlap with social activities such as "meeting" or "gathering." Trees, cherry blossoms, and autumn leaves form part of the natural environment and are associated with aesthetic value; however, they also play a role in cultural heritage values because they are related to festivals and events. Therefore, in this study, the library was created by assigning the attributes of words after completing word analysis without assigning words to CES in advance. The characteristics of CES were examined through the created library. The association of certain types of CES is unsurprising given that many ecosystem services types ( As CES expression words can overlap in meaning, the method of assigning and evaluating words to each CES is not appropriate, and more research on expression words is needed. Social media-based indicators should be carefully considered when used to evaluate CES, as there is uncertainty regarding lexical representation. In other words, there is no clear answer regarding what types of CES people use and bene t from. As more usable big data are created, it is necessary to develop a method that can effectively use such data in studies. To e ciently use big data to evaluate words that denote user perceptions of CES, we require additional research into the methodology of word-embedding and ontology that can evaluate expressions and word-speci c ecosystem services. In natural language processing, word embedding is an effective method for extracting semantic and syntactic information from a large corpus (Lai et al. 2016). An index should be established to understand the value and meaning of individual words, word pairs, and the relationship between words related to CES. Although it is not easy to develop indicators using natural language (Bieling 2014), we expect that using visual and physical indicators conjunctively to assess CES will enable researchers to more clearly identify the nonmaterial bene ts of CES.

Evaluable possible and impossible indicator
In this study, different CES were provided by different types of urban green space; some were clearly distinguished based on user text, however, some were di cult to distinguish by text alone. Although an inductive method was used to nd as many expression words for cultural services as possible, it was di cult to identify cultural services by keywords other than activities. This study revealed the importance of individuating CES that can and cannot be evaluated through text indicators. We were unable to grasp other elements of CES, such as spiritual and religious values, because it was uncommon to nd religious views discussed in blog posts. We analyzed emotion words separately, but the frequency and link values were low; therefore, the words of emotion that remained as the main keyword were very small in number. The inability to examine all CES expressions is a limitation of this study.
It is necessary to distinguish between the services that can be evaluated using indicators derived through the objecti cation of cognition and services, and those that are di cult to assess by individual differences. Speci cally, services can be evaluated through the development of indicators such as user activity, words related to activities, expressive vocabulary for service recognition, and emotional expressions of positive and negative experiences in a space (Bieling 2014). It is particularly di cult to evaluate artistic inspiration, spirituality, and religion through a text index that views space as a CES, which is completely dependent on a subjective point of view ( The materials in this study were not prepared with ecosystem service evaluation in mind. Rather, people expressed their perceptions freely in their own words. As CES assessment requires public participation, social media could be the best path to "citizen science" (Chen et al. 2018 . We investigated if big data can be used to replace survey or interview methods. As a result of text analysis, we identi ed the phrases "the mountain behind the house" and "near the house." Moreover, the word "home" appeared with high frequency. We therefore presumed that writing about daily life was common in the form of thought blogs. Thus, through analyzing big data, we were able to understand daily activities, which are di cult to understand through typical survey methods. The CES were determined by the urban and geographical landscape in Korea. The frequent use of keywords related to mountain climbing and forests is a result of the easily accessible high and low forests in Ansan. Additionally, due to Ansan's geographical characteristics, urban parks are often located in low hills rather than at areas. CES located in forests may also be perceived as having cultural heritage, spiritual value, and religious value, due to the in uence of temples, cemeteries, or totems found within (Ko and Son 2018). We can predict that other cities in the country may reveal equivalent results because of the topographical characteristics of Korean cities. In addition, due to indigenous religions or particular religious sites such as Seonangdangs (shrines to the village deity) in each region, the spiritual and religious values referred to in CES abroad are recognized by differences in CES. As noted in previous studies, CES are often associated with indigenous and native languages (Schnegg et al 2014; Wartmann and Purves 2018). Thus, due to the differences in the natural environment and cultural history of the region, ecosystem service assessment and evaluation criteria are needed for each country.

Conclusions
The study examined how CES are perceived and what keywords de ne urban green spaces through long text written in everyday life. This study con rmed that an attempt to quantify subjective perception is possible through text mining. Despite the limitations of dealing with few data, that bias was useful here as we were particularly interested in examining word interactions with CES using the text data that was used and reproduced for information retrieval of many people.
Our work demonstrated that the social media data on CES greatly extends the type and, especially, the volume and scale of information that can be derived from traditional survey methods (Retka et al. 2019; Zhang et al. 2020). However, the data from this study were limited, as only blog texts from social media platforms were analyzed. Although, as the number of users accessing social media from their smart phones increases and the user base continues to grow, studies relying on social media data could become more representative by including different age groups.
It will be very useful if the meaning of words and the subtle differences between words can be further subdivided by utilizing more texts generated in the post-corona era (an era where non-face-to-face methods are becoming less common). If the meaning of the expression can be derived and used as a relative indicator of CES, it is likely that the qualitative aspects, which are currently alienated from evaluations of ecosystem services, can be analyzed.
We will continue to explore this area in subsequent studies. It is necessary to build a dictionary of CES that can be used as a lexicon and index like that of Vigl et al. (2021), which used text analysis by processing the entire Wikipedia ontology. If it will be possible to develop an ontology system that automatically establishes the concepts between words according to the type of ecosystem service, rst, it will be possible to quantitatively evaluate CES through the relationship between keywords, second, if the keyword network and natural asset information are linked, an integrated semantic information retrieval system can be implemented.

Declarations Funding
The author declares that his research did not receive any speci c grant from funding agencies in the public, commercial, or not-for-pro t sectors.

Competing Interests
The author has no known competing nancial interests or personal relationships that could have appeared to in uence the work reported in this paper.

Author Contributions
This research originated from a doctoral dissertation, with much of it performed by Ha-Jung Ko. The advice of Professors Seong-Woo Jeon, Dong-Kun Lee, and Yong-Hoon Son were sought and implemented during the doctoral thesis review process.

Data availability statement
The data used for this study are available from the author on reasonable request.
Map of Ansan city, Republic of Korea Figure 2 Work ow of data collection and analysis The co-concurrence word relationships with high links (red: activity, blue: emotion, mint: object, gray: place; The size means the higher the node eigenvector centrality. A thicker line means a higher link value.)