Loan default risk prediction is a major application of machine learning for financial institutions to evaluate the client's default probability.
Existing deep learning models rarely consider the connection among application records for loan default detection. We believe similar records, as auxiliary information, are also significant for loan default prediction, particularly for those records with many missing data. Additionally, in practical scenarios, the data distribution is imbalanced since the default records are small samples, which may also lead the model to achieve sub-optimal results. To this end, we propose multi-view loan application graphs, dubbed MLAGs, for small sample augmentation. Additionally, based on the graph convolution, similar records can also be aggregated to alleviate the issue of missing values.
Moreover, a multi-view graph convolution network, named MGCN, is applied for loan default risk prediction. We conduct experiments on three public datasets from real-world home credit and P2P lending platforms, which show that MGCN outperforms both conventional and deep learning models.