Prediction Poverty Levels of College Students Using a Machine Learning Model

DOI: https://doi.org/10.21203/rs.3.rs-919541/v1

Abstract

Nowadays, poverty-stricken college students have become a special group among the college students and occupied higher proportion in it. How to accurately identify poverty levels of college students and provide funding is a new problem for universities. In this manuscript, a novel model that combined Random Forest with Principle Components Analysis (RF-PCA) is proposed prediction poverty levels of college students. To build this model, data was firstly collected to establish datasets including 4 classed of poverty levels and 21 features of poverty-stricken college students. Then, feature dimension reduction includes two steps: the first step we selected the top 16 features with the ranking of feature, according to the Gini importance and Shapley Additive explanations (SHAP) values of features based on Random Forest (RF); the second step of feature extraction through Principle Components Analysis (PCA) extracted 11 dimensions. Finally, confusion metrics and receiver operating characteristic (ROC) curves were used to evaluate the performance of the proposed model, the accuracy of the model achieved 78.61%. Furthermore, compared with seven different classification algorithms, the model has a higher prediction accuracy, the result has great potential to identify the poverty levels of college students.

Full Text

This preprint is available for download as a PDF.