The Inuence of Borrowers' Language Features on P2P Lending

Under the background of the reshue of the P2P market in China, this paper investigates the inuence of four borrower's language features on their funding and default rate based on language function theories. In our study, we use a logistic regression model and the empirical results show that: the more redundant the borrower's language expression is, the more open and objective the content is, and the more attention is paid to the punctuation details, the easier it is to obtain the loan successfully. When the borrower's description is more redundant and more attention is paid to the punctuation details, the probability of default would become lower. Taking the education level into consideration, we nd that the negative relating effect between the description redundancy and the default rate would be lower with the increase of the borrower’s education level. Therefore, we can conclude that the four linguistic features of borrowers which are dened in this paper can alleviate the information asymmetry problem of P2P lending to some extent and the borrower's linguistic features can be included into the risk control system.


Introduction
At present, the Chinese Internet nancial business mainly includes the third-party payment, crowdfunding, P2P lending, digital currency, big data nance and others, among which the P2P lending has the biggest market scale. However, as a new mode of loan nancing, P2P lending lacks the complete mechanism of risk control and early warning, creating increasingly serious risk exposure problems. The thunder tide, a phenomenon of collective debt evasion of P2P platforms, which started in June and broke out in July has shocked investors in 2018 . Data  After the rapid expansion of P2P and the current contraction, how to perfect the risk control system and reduce the information asymmetry between the borrowers and lenders has undoubtedly become an urgent problem to be solved. Different from the traditional lending mediated by banks, the original de nition of P2P lending is a kind of private lending without any intermediary and guarantee (Lin, Prabhala,& Viswanathan, 2013). As a third party, P2P platform only provides information exchange and credit evaluation services for borrowers and lenders. However, we nd that many P2P platforms tend to launch eld certi cation and institutional guarantee in order to reduce the borrower's default risk. In fact, in the case of no intermediary and guarantee, the process of the transaction between the borrower and the lender is also a game of information between the two sides (Wang & Kong, 2014). The main risk sources of P2P lending are moral hazard and adverse selection problems caused by information asymmetry (Simkins & Rogers, 2004). To date, many scholars have done a lot of research on this.
What kind of people are more likely to get loans? What kind of person will repay as promised? These two problems have been extensively studied. Based on the research on the demographic characteristics of borrowers, it is found that there are certain racial, gender and beauty discrimination in the lending market (Pope & Sydney, 2011;Laura et al. 2014). With the development of P2P online loan transaction, the borrower's historical behavior data has also become a reference for the lender. The research results of Puro et al (2010) and Chen and Ding (2013) show that the borrower's credit score, total repayment ratio, debt record, overdue repayment times and other previous data play an important role in the borrower's success in obtaining the loan. These factors also have a certain role in indicating whether the borrower is possible to default (Tao & LIN, 2016).
In addition to individual information, the borrower's interpersonal relationship and social network relationship have been areas of extensive empirical research. A large body of empirical literature found that social capital and friendship among borrowers can improve the credibility of them to a certain extent so that to promote the success of loans and reduce the default rate based on the analysis of social network relationship (Cassar & Wydick,2010;Godlewski, Sanditov, & Burger-Helmchen, 2012;Lin, Prabhala & Viswanathan, 2013). There also exists obvious herding behavior in the online lending market .For instance, Luo and Lin (2012) observed that friend bids and bid counts impose signi cant effects on the decision-making time of investors, which is considered as the evidence of herding. Lee et al (2012) also found strong evidence of herding and its diminishing marginal effect as bidding advances in P2P platform in Korea.
It is not di cult to nd that the interpersonal relationship between the borrower and the lender has its unique particularity in the borrower's social relationship. The borrower takes "getting" as the purpose to obtain "giving" from the lender, and the interpersonal function of these two parts is usually embodied in the linguistic communication (He, 2018;Galema, 2019). Based on this, many scholars study the descriptive text information of the loan requests. Word use is generally considered to be a meaningful marker of cognitive and social processes (Pennebaker, Mehl, & Niederhoffer, 2003). The description can re ect a borrower's identity features, such as "success", "diligence" and "reliability", which will be related with the loan success and failure. (larrimore, 2011;Li, 2014;Wang, 2015). For example, Herzenstein et al.
(2011) found that being trustworthy or successful are associated with increased loan funding but ironically they are less predictive of loan performance compared with other identities (moral and economic hardship).
However, due to the existence of the subjectivity of language interaction, different information receiver (investor) will have different perceptions on the features of information (Manuti, Traversa, & Mininni. 2012), so extracting the borrower's identity features by reading their description subjectively would cause some deviations. As a result, some scholars summarize the features from the text itself, and study the effect of intuitive text features such as punctuation, text length and misspelling on the prediction of the success and failure of loan (Dor eitner, 2015; Chen, Huang, & Ye, 2018). Results show that these intuitive text features have an impact on the granting decision of the lender, but they cannot effectively predict the repayment probability of the borrower.
Cannot intuitive text features predict the default? Language has its discourse function, interpersonal function and experiential function (Halliday, 1983). Interpersonal function is often related to the personal experience and personality traits of the language's expression subject and the receiving subject (Frost, Czyzewska,& Kasch, 1992). It re ects the attitude of the expression subject to the content (Zhang, 2004). A major issue is the difference of language structures between English and Chinese. Chinese characters are ideographic characters, so they have a stronger ability to expand (Foo, Schubert,& Hui, 2004).The presentation of information in Chinese text is not divided by words, but by phrases as the main carrier of expressing information Zagibalov &Carroll, 2008 Therefore, we believe that combined with each person's educational background, individuals will have different habits when they combine phrases to express information, which will re ect the differences in individual expression ability and thus re ect the differences in personality Based on the existing research perspective and results, we try to extracte the language features from the intuitive text features of the borrower's loan requests to support its effect on the prediction of funding success and loan default.

Hypotheses development
In this paper, we summarize four borrower's language features, expression redundancy degree, objectivity degree, shortsentence preference degree and punctuation control degree. Expression redundancy measures whether the borrower is wordy and verbose during the process of expressing information; Objectivity is an indicator to measure how detailed and objective when the borrower discloses his personal information; Short-sentence preference is to measure the borrower's preference for short sentences; Punctuation control will show us borrower's attention and control to punctuation. The speci c de nition and calculation will be explained in detail in the later section. At present, we will make assumptions about how these features in uence borrowing success and default rates based on the previous researches.
When Kahneman and Tversky (1973) studied on the Intuitive predictions, they concluded that in making predictions and judgments under uncertainty, people prefer to rely on a limited number of heuristics and representativeness. As a result, investors tend to read selectively to nd key information or words which can be to representativeness of borrowers' personality to access the borrowers. Therefore, if the borrower's expression is too verbose in the loan description and loan title, and prefers long sentences which may cover the keyword information, it will disturb the lender's granting decision to a certain extent. Previous studies have also included basic information about the readability of the loan description (i.e., average word and sentence length) as control variables (Pope&Sydnor,2011).In addition, the usage of punctuation in the text re ects the borrower's control and cognitive ability. Excessive usage of punctuation makes loan description informal and reduces the readability of the text (Chen & Huang, 2018). Therefore, we suppose that the detail that whether the loan description sentence ends with a punctuation mark also has a certain impact on the readability of the loan request description. Research shows that lenders prefer borrowers who disclose their personal information proactively (Jiang,Wang,&Chen, 2018). For instance, some borrowers will use their true names, telephone number as their account names and disclose their detailed profession and revenue information by quote gures in their loan request description which are not indispensable in platforms. So it can be inferred that the objectivity of the borrower's expression has positive associations with funding success. Therefore, we propose Hypothesis 1 as follows: H1a: Redundancy degree in loan requests has negative associations with funding success H1b: Short-sentence preference degree in loan requests has negative associations with funding success H1c: Objectivity degree in loan requests has positive associations with funding success H1d: Punctuation control degree in loan requests has positive associations with funding success The loan success rate measures the lender's feedback on the borrower's information disclosure, while the repayment probability re ects the borrower's nancial status and credit situation, which is often closely related to personal characteristics. Previous studies have shown that personality quality and personal identity content play an important role in predicting loan default Larrimore et al, 2011). The larger the amount of key information in the loan description, the less likely the borrower is to default (Yu, 2017). According to the de nition of this paper, a borrower's expression redundancy degree is high means that the same text length contains less key information. The borrower's expression exists largely to maintain his/her image to others, which has the possibility of self-whitewashing and intentional inducement (Baumeister, 1982). To a certain extent, objectivity may also be one of the tools for borrowers to obtain an objective image. We speculate that borrowers with such whitewashing are often less honest than they express. The borrower's preference for sentence length is mainly related to the lender's reading experience.
Generally speaking, borrowers who prefer to use short sentences are more colloquial in language expression. Their preciseness and self-control ability may not be as good as those who prefer writing long sentences, so we predict their default probability would be higher. As mentioned above, the punctuation control in loan requests re ects the borrower's self-control ability. We speculate that borrowers with strong self-control ability have stronger constraints on their own behavior and higher repayment probability. Therefore, we propose Hypothesis 2 as follows: H2a: Redundancy degree in loan requests has negative associations with default probability H2b: Objectivity degree in loan requests has positive associations with default probability H2c: Short-sentence preference degree in loan requests has positive associations with default probability H2d: Punctuation control degree in loan requests has negative associations with default probability Method Data source and sample selection The data used in this study are obtained from Renrendai (renrendai.com), one of the largest peer-to-peer lending platforms in China. This study uses all loan requests created on Renrendai between June 1, 2016 and September 30, 2018. 13,485 loan requests are selected as our research samples by random sampling. Then manually extract other required texts from Renrendai web page according to the loan number. We preprocess the collected data as follows: rst, exclude the institutional guarantee, eld certi cation and credit assurance certi cation; second, exclude the loan titles, blank loan description or the texts without any language features (for example, "42378", "reserved zzxnejek,", etc).
There are 3,567 pieces of data remained in total, and 2,720 pieces of effective data are nally retained after the abnormal data is eliminated.

Variables explanation
In this paper, for the borrower's language features, expression redundancy, objectivity, short-sentence preference and punctuation control are de ned as follows: Redundancy: expression redundancy measures whether the borrower's language expression is wordy and verbose. It is calculated by dividing loan description words by number of key information. Higher redundancy means the borrower describes less key information in more words, that is, the more useless and wordy words in loan description, the higher the redundancy. The number of key information is obtained by manual marking one by one. The effective information includes personal identity, position, residence, income, loan purpose (unclear purposes such as daily consumption and turnover are not included), car and real estate situation, repayment source, etc. Some studies have shown that the lender does have a certain investment bias for the borrower's loan purpose and other effective information (Zhuang, For example, if the loan application text is "The loan is mainly used for starting a business. I have real estate without mortgage and I have a good credit with no overdue. The source of repayment is my wage income, which is about 5000 yuan per month, so there is no pressure for repayment." Here "starting a business", "have real estate without mortgage", "without overdue" and "The source of repayment is my wage income, which is about 5000 yuan per month." are four key information. Objectivity: an indicator to measure whether the borrower's language expression actively discloses personal information and whether the information disclosed is detailed and objective. Objectivity degree is calculated by the number of gures quoted in the loan description, which is also manually marked one by one. The result of Larrimore (2011) show that the use of quantitative words that are likely related to one's nancial situation had positive associations with funding success which was considered to be an indicator of trust. We reason that the number gures quoted can to some extent affect the lender's granting decision on whether the borrower is trustworthy or not.
For example, if the loan application text is "I have twice borrowed money from Renrendai, with a good repayment record. At present, I have built 4 houses, all of which have applied for real estate certi cates. I want to decorate one set for my own use. But due to the shortage of funds, I take this platform to raise funds. " Here the number of gures quoted is 2.
Short-sentence preference: an indicator to measure the borrower's preference for short sentences. Short-sentence preference degree is calculated by dividing number of punctuation marks by number of loan description words. The larger the value is, the more the borrower prefers to use short sentences. Using short sentences often can increase the readability of the text.
Punctuation control: an indicator to measure the borrower's attention to punctuation. Here, it mainly uses whether the loan description ended with punctuation as the judgment standard. Although it is only a small detail, it can re ect the borrower's standard degree to the written language format to a certain extent. At the same time, the borrower remembers punctuation at the end of the loan description can make the description text more formal. Such a borrower may have the personality characteristics of doing a thing through from beginning. We believe that the characteristics may have a predictive effect on the performance of the borrower after successfully obtaining the loan. Other variables are shown in Table 1.

Model Construction
In our study, we employ logistic regression model to study the in uence of borrower's language features (redundancy, objectivity, short-sentence preference and punctuation control) on the funding probability and default probability of P2P lending market. Our empirical model is as following: where the dependent variable Y i is a binary variable equal to 1 if the borrowers successfully have their loan requests granted (or default after receiving funding) and 0 otherwise. The main explanatory variables are Red (redundancy), Obj (objectivity), Sen (short-sentence), Pun (punctuation), the borrower's language expression features extracted in this paper: expression redundancy, objectivity, short-sentence preference and punctuation control. X i is a vector of control variables, including amount of borrowing, interest rate, credit score of the borrower, age, education level, income, marriage status, working experience, etc., and ε is the random disturbance term.

Descriptive statistics
In this paper, rst of all, descriptive statistics are conducted for the variables, and the results are shown in Table 2. As a result, our sample includes 2,720 loan requests, of which 2,474 were successfully funded while the remaining 246 were not funded. Among all requests that were successfully funded, there were 1,331 defaults and 1,143 loans were repaid on time. In addition, we can roughly see the demographic characteristics of the online loan market. Male borrowers are more active in the online loan market (the mean value of gender is 0.8739), mainly aged between 20 and 40 years old.
Married people account for a large proportion (the mean value is 0.6235). The average academic background of the borrowers participating in the online loan is above college, and their educational level is higher than the social average.
The average worktime of the borrowers is between 1 and 3 years, but they always have low income level (the average income is 2.9353). Table 3 lists the correlation coe cients for all four key explanatory variables used in our analysis. Data shows that there exists correlation among the four key explanatory variables, but the correlation is relatively low. It also suggests that multicollinearity problems don't arise, so subsequent regression processing can be carried out.

Language features and loan success rate
In this subsection, we examining the four variables that re ect the borrower's language features and success rate. A total of 836 samples were selected in the empirical regression process. We report the results in Tables 4. In Table 4, column (1) show that borrowers with higher credit rating, older age, longer worktime, higher educational background are more likely to be funded. Lenders are more willing to choose the requests with lower loan amount. In column (2), we mainly research the relationship between the four main independent variables and the loan success rate without excluding the mixed effect of control variables. Results show that the expression redundancy, objectivity and punctuation control are all signi cant at the level of 5%. In column (3), except the mixed effect of control variables, the three variables are still signi cant at the level of 10%. Compared with (1), the regression model (3) with the four variables of redundancy, objectivity, short-sentence preference and punctuation control degree has a certain improvement. As a result, H1c and H1d are supported but we failed to support H1a and H1b.

Language features and loan default rate
In Table 5, we mainly try to support Hypothesis 2 by study the relationship between the four language features variables and default rate. In the process of empirical regression, 2,474 samples were selected. Among them, column (4) shows that the lower the credit rating of the borrower is, the older the borrower is, and the lower education the borrower has, the more likely it is that the borrower will default. And borrowers with high income have higher default risk and so they have higher interest rate. In column (5), redundancy and punctuation control are negatively related to the default rate at the signi cance level of 5%. In column (6), except the mixed effect of the control variables, those two variables are still signi cant at the level of 5%, and the direction of signi cance remains unchanged, and they are still negatively related to the default rate. Compared with the regression model (4), regression model (6) which includes the four variables has improved the goodness of t and prediction accuracy to a certain extent. Here, H2d is supported while H2a,H2b andH2c are failed to be supported.

Language features and education levels
After studying the in uence of the borrower's language features on the loan success rate and default rate, we further trace the main factors that affect the borrower's language features. Previous studies have agreed that education has a signi cant effect on the loan success rate and loan default rate of P2P lending (Gathergood, 2012).Therefore, we add four interaction terms between language features and education levels respectively for further research.
The regression results in Table 6 show that in the prediction model of loan success rate, the adjustment among education, redundancy and punctuation control degree is not signi cant (column (8) and (9)), the interaction item of "Obj*Edu" is signi cant at the level of 5% (column (8)), and the interaction coe cient is negative, indicating that the higher the education level is, the weaker the positive effect of objectivity on the success rate of loan is. Education level has negative effect on the relationship between objectivity and the success rate. Similarly, in the prediction model of loan default rate, there is no signi cant moderating effect between education level and punctuation control (column (11)), but the regression coe cient before the interaction item "Red*Edu" is signi cant negative at the level of 10% (column (10)). That is to say, the higher the education level of the borrower, the stronger the negative effect of redundancy on the default rate. We can conclude that the education level of the borrower plays a certain role in regulating the relationship between language features of the borrower and the loan success rate or default rate.

Robustness test
This subsection performs two kinds of robustness tests to ensure the validity of the empirical results reported in the previous subsections. Method 1: Resampling. When verifying hypothesis 1, 836 samples are collected, among which the ratio of successful funding to failed funding is about 1:4. In the robustness test, we resample 681 sets of data, in which the ratio of successful funding to failed funding is about 1:2. Method 2: Adjusting the control variables. In the logistic regression model mentioned above, we mainly introduce nine control variables, such as gender, age, marital status. When verifying the robustness of the loan default prediction model, we add another two control variables, the number of loans and the number of successful loans, a total of 11 control variables. The veri cation results are consistent with the previous results. However, limited by the article space, we do not list the formula here, but our empirical results are robust.

Discussion
Actually, building trust and reduce information asymmetry is a critical issue to p2p online lending.
At present, a large number of studies focus on the borrowers' demographic characteristics (e.g, gender, age and income), social network relationship, and little attention is paid to their nicknames, loan titles, loan descriptions they use when applying for loans. This unveri able information can show the borrower's personality and helps the lender judge the borrower's integrity. However, descriptive information is not speci c and concrete enough which need to be further processed and extracted to build their relationship with loan success and loan default. This paper summarize four language features of borrowers to analyze their prediction effect on the success rate and default rate of loans in P2P network lending platform. As the non-quantized soft information of the borrower, the language features of the borrower play a certain role in alleviating the information asymmetry of P2P lending and can measure the credit level of the borrower.
Speci cally, the results in terms of language features and loan success show that lenders prefer those borrowers who quote more gures in their loan description. Larrimore et al (2011) also concluded that quantitative words had positive associations with funding success. Our conclusion is consistent with his results. We nd that borrowers who pay more attention to punctuation details are more likely to achieve a loan while Chen et al (2018) supposed that amount of punctuation is negatively associated with the funding probability. It is because that our focus on punctuation is different, we uses whether the loan description ended with punctuation as the judgment standard to access the reliability of borrowers not the amount of punctuation they used in loan description. Expression redundancy will reduce the readability to some extent, but it contains more key information to improve the credibility. This result is similar with the one of Dor eitner's (2016) conclusions that the length of the description text in a loan application is positively related to funding success.
When it comes to language features and loan default, our results indicate that the borrower's control over the punctuation details in the loan description process can really re ect the borrower's credit degree. The attention to the statement format speci cation can also re ect the borrower's self-control to a certain extent, which indirectly re ects the borrower's control over the repayment behavior. We also conclude that the more redundant the language expression is, the lower the loan default rate will be, that means the borrower's 'redundancy' in the loan description and loan title can show the borrower's attention to the loan behavior through repeated description. This attention constitutes a selfrealization of psychological expectation for the repayment behavior in the future, which drives the borrower to a success repayment (Sehlender & Weigold, 1992; Berger & Heath, 2007). In addition, the correlation between shortsentence preference, objectivity and the default rate is not signi cant. We reasoned that the sentence length preference in language description is just a kind of expression habit of the borrower, which has little effect on the default.
Borrower's 'Objectivity' is an objective statement of their personal situation, but it can not guide and restrain their postloan performance, so the correlation is not signi cant here.

Implications for Practice
Our research indicates that there are several strategies borrowers' should use to increase trust and thus the likelihood of getting funded. First, borrowers can extend the length of loan description and try to be more wordy and verbose so that it can convey more information for lenders to judge the borrowers' identity. An additional effective strategy to attract lenders is to quote more gures when describe the loan. Borrowers also should pay more attention to the use of punctuation. Remember to use punctuation at the end of loan description will give lenders a feeling of reliability. To lenders, they can give more chance to borrowers with higher education and learn to extract information form borrowers' loan title, loan description to access whether they are trustworthy. From our empirical results, the borrower who is verbose and has a good control of punctuation is less likely to default. There is no doubt our research conclusions can mitigate the issue of information asymmetry between borrowers and lenders.

Limitation and future study
This study is subject to several limitations. First, the number of failed loan samples in our selected study interval is relatively small, leading to some deviation in the research results. Researchers can further expand the study interval.
Second, our data includes the personal information of borrowers and whether they fund successfully and default, we have no additional information about lenders. Different lenders will have a different preference on the subject of the loan. Therefore, additional research could capture the mental maps of lenders based on the perspective of decision makers. Third, in order to ensure the objectivity of the de nition of language features, we did not pay much attention to the content of description. However, as a rich and varied media, narrative language is far more complex than our exploration. The narrative content may affect the lender's mood, or the lender's evaluation of the borrower and the quanti cation of this impact is also one of the directions of future research.    Short-Sentence 0.09*** -0.09*** 1 Punctuation -0.14*** 0.13*** 0.43*** 1 Notes: *p < 0.1, **p < 0.05,*** p < 0.01.  correct Notes: *p < 0.1, **p < 0.05,*** p < 0.01. Notes: *p < 0.1, **p < 0.05,*** p < 0.01.