In the realm of sports, match-fixing issues tend to occur constantly and damage the fundamental value of fairness in sports. Various methods have been proposed to solve this problem. Efforts have been made in sports to build a match-fixing anomaly-detection model using match data.
In the academic world, studies have attempted to detect match-fixing using anomalous match data. Kim et al. [26] converted sports dividend yield data into graphs and applied the CNN algorithm to sort normal and abnormal matches by comparing their dividend yield graphs. Ötting et al. [24] used the GAMLSS model based on dividend yield and betting volume data to identify differences between fixed and non-fixed matches and evaluated the model’s ability to detect fixed matches.
Previous studies have examined suspected matches using a single model based on football match dividend yield data, with an accuracy rate of 70–80%. The misclassification rate was approximately 20%. However, inevitable biases and errors in single-model analyses hinder their practical application. Consequently, the current study aimed to suggest a solution to sports match-fixing using various AI models to detect anomalies based on dividend yields by constructing a database with such variables as sports match results, league ranking, and players.
To reduce errors in a single model, this study relied on four models frequently used in machine learning: LR, RF, SVM, and KNN classification. In addition, this study used the ensemble model, which is an optimized model of the previous four. Using these five models, this study aims to distinguish between normal and abnormal matches. The accuracy of the present results was higher than those in previous research for sorting matches, with three models (RF, KNN, and ensemble) showing an accuracy of over 90% and two (LR and SVM) models showing an accuracy of 80%. A combination of the models was used to identify suspicious matches, as each model suggests different suspicious cases, reducing the likelihood of considering valid matches as suspicious.
Moreover, after collecting data from real-time matches, we applied five models to construct a system capable of detecting match-fixing in real-time. The models are built on previous match data and collect real-time match data to ascertain fraudulent matches. Our study aimed to provide an environment for real-time analysis and investigation by building a system that collects real-time data before and during matches, decides whether a match is suspicious, and acts promptly.
However, previous research on data-based statistical detection of match-fixing revealed that match-fixing cases are relatively minor compared to normal matches [18], which the current study confirmed. Real-time data collection on sports matches could contribute to the creation of a more accurate detection system.
Determining whether a match is fixed cannot rely solely on abnormal patterns and data [22]. However, the detection model could help identify abnormal and normal matches in real-time and provide more detailed data to facilitate the investigation of match-fixing cases. Moreover, it could benefit the public, as these real-time data would detect match-fixing in games. Furthermore, the detection model may prevent match-fixing brokers and players from committing match-fixing, as they are aware of the risk of real-time detection. The results of this study could guide the future detection of match-fixing in sports.