AI-Based Betting Anomaly Detection System to Ensure Fairness in Sports and Prevent Illegal Gambling

doi:10.21203/rs.3.rs-2800498/v1

Download PDF

Article

AI-Based Betting Anomaly Detection System to Ensure Fairness in Sports and Prevent Illegal Gambling

https://doi.org/10.21203/rs.3.rs-2800498/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 18 Mar, 2024

Read the published version in Scientific Reports →

You are reading this latest preprint version

This study develops a solution to sports match-fixing using various machine-learning models to detect match-fixing anomalies based on dividend yields. We use five models to distinguish between normal and abnormal matches: logistic regression (LR), random forest (RF), support vector machine (SVM), the k-nearest neighbor (KNN) classification, and the ensemble model, an optimized model of the previous four. The models classify normal and abnormal matches by learning their pattern with sports dividend yield data. The database was built on the world football league match betting data of 12 betting companies, with a vast collection of data on players, teams, game schedules, and league rankings for football matches. We develop an abnormal match detection model based on the data analysis results of each model, using the match result dividend data. Then, we use data from real-time matches and apply the five models to construct a system capable of detecting match-fixing in real-time. The RF, KNN, and ensemble models recorded a high accuracy of over 92%, whereas the LR and SVM models were approximately 80% accurate. By comparison, previous studies have used a single model to examine suspected matches using football match dividend yield data, with an accuracy of 70–80%.

Physical sciences/Engineering

Physical sciences/Mathematics and computing

Physical sciences/Mathematics and computing/Computational science

Physical sciences/Mathematics and computing/Computer science

Physical sciences/Mathematics and computing/Information technology

Physical sciences/Mathematics and computing/Software

Sports events are hosted in an environment of fair competition among members, which is governed by rules for each game and professional referees that make fair judgments [1,2]. In a fair competitive environment, game results are determined by the internal factors of athletes, including physical ability, effort, and conditions, and external factors, including chance, such as weather, field conditions, and referee standards [3]. The public is enthusiastic about sports because of the uncertainty of the results under various conditions and the belief that the players did their best under fair conditions. However, it is challenging for athletes to increase their competence and train to perform at the highest level in competitions [4,5]. Efforts to ensure fairness in sports are ongoing. To ensure fairness and equal chances of winning for all contestants regardless of different physical abilities, athletes in some sports are classified by gender and weight for competition, and by age in others to ensure equality of opportunity regardless of differences in cognitive ability [6].

Unfortunately, some people predetermine sports results through illegal practices [7,8]. The typical illegal practices include doping, which refers to the use of banned substances, such as performance-enhancing drugs (PEDs) in competitive sports, and match-fixing, which is the act of playing or officiating a match with the intention of achieving a predetermined result by manipulating internal conditions, such as referees, opponents, or coaches [9,10]. There are various types of match-fixing, largely divided into those in the pursuit of financial gain and those involving human networks. In the former case, athletes and brokers involved earn dividends by betting through a betting site, whereas the latter is conducted in pursuit of honor or advantage in entrance exams [11,12]. The most frequent type of match-fixing is related to financial gains. As an average professional athlete is likely to retire in their late thirties, they face uncertainty in their futures and may feel tempted to be involved in match-fixing as an easy way to make money [13]. Match-fixing in sports is emerging as a serious issue and an activity that damages the spirit of sports. Furthermore, it has a substantially negative impact on the sports industry. Therefore, it is necessary to develop a system to detect match-fixing in sports.

2.1. Market risks of match-fixing

Match-fixing in sports may create huge profits for those involved in corrupt activities; however, it has significant negative consequences, such as threatening the integrity of the sport and causing fans to leave. Although people love sports for various reasons, the uncertainty of the results is at the core of this love. As chance factors, such as the condition of the players during the game, influence the match result, the public is enthusiastic about sports and cheers for the athletes. If the match results are manipulated and predetermined without uncertainty, the public will abandon sports and athletes will lose their motivation to compete [14].

If match-fixing continues, it will have a substantial negative influence on sports, and the industry will inevitably shrink. Thus, it is crucial to detect anomalies and match-fixing to protect the future of sports and athletes.

2.2. Detection of behaviors of athletes and those involved in match-fixing

Various studies have been conducted on match-fixing detection. Some of these studies have focused on detection using the player behavior patterns. For instance, in 2014, a common-opponent stochastic model (ELO) was developed to predict the outcome of professional tennis matches and identify match-fixing when anomalies arise in athletes' behavior during games and betting [15]. Similarly, a study investigated the behavior of players in tennis to detect match-fixing in games by examining the number of rallies between players to determine if they followed Banford’s Law [16]. To assess the status of match-fixing other than betting, surveys were conducted for athletes on problems unrelated to betting profits, including school admission and coaches’ requests [17].

2.3. Match-fixing detection using betting dividend yields and market price figures

The ability to detect match-fixing through the behavioral patterns of players that are influenced by contingent factors and players’ physical conditions is limited. Therefore, it is necessary to set an index that can predict game results to detect anomalies using sports game data. The index, which can predict game results and identify differences between competing teams, can be represented by the sports betting dividend yield [18]. The betting dividend yield is generated by considering a range of factors, including recent performance, game flow, match results, injured players, and penalized players. Strong teams receive low odds, whereas weak teams receive high odds. Odds have often been used as an index to determine the value of athletes and teams in studies to predict match results [19]. In a study on the detection of match-fixing, data from various online betting sites were examined by monitoring betting in real time; match-fixing was determined when an irregular betting pattern occurred for the same game on a specific site [20]. Archontakis and Osborne [21] detected match-fixing by analyzing the betting results of the 2002 World Cup soccer match using the Fibonacci sequence. In addition, previous studies have used data from the Sportradar Fraud Detection System (FDS), which detects match-fixing based on global betting activities for soccer games [22]. Other studies have attempted to detect match-fixing through the betting dividend yield [23, 24]. This method is considered an effective way of detecting match-fixing and is accepted by the Court of Arbitration for Sports (CAS) as the main evidence in sports match-fixing cases [25, 26].

Continuous efforts have been made to build a system for detecting abnormal signals in sports. To eliminate cheating in sports, further efforts have promoted the introduction of monitoring systems to prevent match-fixing [27]. In addition, as the odds pattern for match-fixing occurs at specific sites, the continuous data collection to identify match-fixing through these sites can be presented as a solution for eradicating sports match-fixing. The present study proposed a solution to eliminating match-fixing in sports by building a database of a range of variables, including sports results, team rankings, and players, using an AI-based model to detect anomalies based on the sports betting dividend yield.

This study aimed to build a sports betting database to ascertain anomalies and detect match-fixing through betting dividend yield data. The database contains data on sports teams, match results, and betting dividend yield. A match-fixing detection model was created based on the database.

3.1. Sport database

The database was built on world football league match betting data of 12 betting companies (188bet, Interwetten, Vcbet, 12bet, Willhill, Macauslot, Sbobet, Wewbet, Mansion88, Easybet, Bet365, and Crown), using historical database documentation of iSports API. The latter provides a vast collection of data on players, teams, game schedules, and league rankings for every sports league, including football, basketball, baseball, hockey, and tennis. This study constructed a database using data on soccer matches. As shown in Table 1, 31 types of data were collected.

Table 1

Collected data.
S/N	Form	Description
1	Player	Player Profile
2	PlayerInTeam	Player’s Team Information
3	Team	Team Profile
4	Sclass	League & Cup Profile
5	SclassInfo	Country Team Profile
6	Schedule	Schedule & Results Data
7	DetailResult	Events during the Match (change, score, injury)
8	Company	Sports Betting Site Company
9	MultiLetGoal	Asian Handicap
10	MultiLetGoalDetail	Asian Handicap (changes over time)
11	MultiLetGoalhalf	Asian Handicap Half-Time
12	MultiLetGoalhalfDetail	Asian Handicap Half-Time (changes over time)
13	MultiTotalScore	Over/Under
14	MultiTotalScoreDetail	Over/Under (changes over time)
15	MultiTotalScorehalf	Half-Time Over/Under
16	MultiTotalScorehalfDetail	Half-Time Over/Under (changes over time)
17	Standard	Win-Tie-Loss
18	StandardDetail	Win-Tie-Loss (changes over time)
19	StandardHalf	Half-Time Win-Tie-Loss
20	StandardHalfDetail	Half-Time Win-Tie-Loss (changes over time)
21	EuropeCompany	Data 200 + European sports betting sites
22	EuropeOdds	Win-Tie-Loss of 200 + European sports betting sites
23	EuropeOddsDetail	Win-Tie-Loss of 200 + European sports betting sites (changes over time)
24	EuropeOddsTotal	Win-Tie-Loss of 200 + European sports betting sites (average)
25	Score	League Ranking
26	CupMatch_Grouping	Cup Ranking
27	CupMatch	Final Cup Ranking
28	SubSclass	Playoffs
29	TeamTechStatistics	Team Statistics
30	PlayerTechStatistics	Player Statistics
31	PlayerTranslate	Player Position & Staff Role

[Table 1 here]

The variables in Table 1 constitute the database, as shown in Fig. 1. The Flask server is available for users to request data on dividend yield, user messages, and matches. The Admin PC constantly updates match data and stores them in the database. Database building took place in Mongo DB, providing the following servers: Sport Server on matches and weather; League Server on league and cup profiles, league ranking, and events during matches; Odds Server on betting dividend yield of different categories as well as on betting company site; and Player Server on player’s performance, profile, and other information.

3.2. Betting models

3.2.1. Support vector machine

Support vector machine (SVM) is a data classification model that uses a decision boundary to separate the data space into two disjoint half properties. New input data are classified based on their similarity to one of these properties. The larger the boundary data gap, the more accurate the classification model. Hence, it is common to set up random outliers on both sides of the decision boundary, known as margins. In this study, a maximum margin was created to enhance classification accuracy, and the data entering the margin were eliminated.

The SVM algorithm on the p-dimensional hyperplane is shown in Eq. (1), with $f\left(X\right)=0$.

$$f\left(X\right)={\beta }_{0}+{\beta }_{1}{X}_{1}+\dots +{\beta }_{p}{X}_{p}$$

$$f\left(X\right)=0$$

The F(x) value on the hyperplane is 1 (Class1) if $f\left({X}_{i}\right)>0$, otherwise − 1 (Class2) if ($f\left({X}_{i}\right)<0)$. Data were considered well sorted when the value of Eq. (3) was positive, following ${Y}_{i}$ on (-1,1).

$${Y}_{i}\left({\beta }_{0}+{\beta }_{1}{X}_{i1}+\dots +{\beta }_{p}{X}_{ip}\right)>0$$

With a hyperplane, as shown in Eq. (3), the data can be divided by different angles. However, for a classification model to be highly accurate, the hyperplane should be optimized by maximizing the margin between different data points. This leads to finding the maximum “M” (margin), as shown in Eq. (5). Consequently, the hyperplane and margin are designated while allowing errors ${ϵ}_{i}$ to some degree, before eliminating all data inside the margin as outliers.

${\beta }_{0},{\beta }_{1},\dots ,{\beta }_{p},{ϵ}_{1},\dots ,{ϵ}_{n}MMaximizeM$	(4)
$subject to {\sum }_{j=1}^{p}{\beta }_{j}^{2}=1$	(5)
${Y}_{i}\left({\beta }_{0}+{\beta }_{1}{X}_{i1}+\dots +{\beta }_{p}{X}_{ip}\right)\ge M(1-{ϵ}_{1})$	(6)
${ϵ}_{1}\ge 0, {\sum }_{i=1}^{n}{ϵ}_{i}\le C$	(7)

3.2.2. Random forest

In the random forest (RF) model, decision trees, which are a hierarchical structure composed of nodes and edges that connect nodes, help determine the optimal result. A decision tree rotationally splits learning data into subsets. This rotation-based division repeats on the divided subsets until there is no more predictive value left or the subset node’s value becomes identical to the target variable. This procedure is known as the top-down induction of decision trees (TDIDT), in which the dependent variable Y serves as the target variable in the classification; furthermore, vector v is expressed by Eq. (8).

$$(v, Y)=({x}_{1},{x}_{2},\dots ,{x}_{d}, Y)$$

While classifying data using TDIDT, Gini impurity may be used to measure misclassified data in a set. While randomly estimating the class, a set with a likelihood of misjudgment near 0 is said to be pure. Therefore, Gini impurity enhances the accuracy of the RF model.

$${I}_{G}\left(f\right)={\sum }_{i=1}^{m}{f}_{i}\left(1-{f}_{i}\right)={\sum }_{i=1}^{m}\left({f}_{i}-{f}_{i}^{2}\right)={\sum }_{i=1}^{m}{f}_{i}-{\sum }_{i=1}^{m}{f}_{i}^{2}=1-{\sum }_{i=1}^{m}{f}_{i}^{2}$$

Trees are trained to optimize split function parameters related to internal nodes, as well as end-node parameters, to minimize defined objective functions when v (data), S₀ (trained set), and real data labels are provided. The RF model optimizes and averages the decision tree results using the bagging method before classification. Bagging or bootstrap aggregation, which refers to simultaneously bootstrapping multiple samples and aggregating results from machine learning, is a method that averages diverse models to identify the optimized version.

3.2.3. Logistic regression

Logistic regression (LR) is a supervised learning model that predicts the probability of given data belonging to a certain range between 0 and 1. The target variable is binary: 0–0.5 and 0.5–1. LR is linear, and each feature value multiplied by a coefficient and added by the intercept gives log-odds against the predicted value, enabling data classification. Therefore, the probability (P) of the event occurring or not occurring was calculated, and the log of the odds was calculated for the classification through the final value.

$$Odds=\frac{P\left(event occurring\right)}{P\left(event not occurring\right)}$$

To evaluate the suitability of the results to the model, it is necessary to calculate and average the loss of the sample. This is referred to as Log Loss, expressed in Eq. (11), which contains the following elements: m = total number of data points, y⁽ⁱ⁾ = class for data i, zⁱ = log-odd of data i, and h(z⁽ⁱ⁾) = log-odds sigmoid that identifies a coefficient minimizing log loss, which gives the optimized model.

$$-\frac{1}{m}{\sum }_{i=1}^{m}[{y}^{\left(i\right)}loglog \left(h\left({z}^{\left(i\right)}\right)\right) +(1-{y}^{\left(i\right)})log(1-h\left({z}^{\left(i\right)}\right)\left)\right]$$

Once log odds or property coefficient values were calculated, they could be applied to the sigmoid function to calculate the outcome of the data, ranging between 0 and 1 and belonging to a given class. In this study, a loss function was used to identify values near 0 or 1 to sort normal and abnormal matches.

3.2.4. K-nearest neighbor

K-nearest neighbor (KNN) is a classification algorithm of k-nearest neighbors based on their data label, using the Euclidean distance formula to evaluate the distance. Based on the Euclidean distance, d (distance) between A (x1, y1) and B (x2,y2) in a two-dimensional land is shown in Eq. (12).

$$d\left(A,B\right)= \sqrt{{(x}_{2}-{x}_{1}{)}^{2}+{(y}_{2}-{y}_{1}{)}^{2}}$$

To distinguish between normal and abnormal matches, the current study designated k as 2 and split array figures into normal or abnormal matches using the dividend yield pattern appropriate for each match. After determining the dividend yield of a new match, the match array pattern allowed us to determine whether it was more normal or abnormal.

3.3. Data preprocessing

This study used hourly win-tie-loss dividend yield data to classify abnormal and normal matches. The learning data were based on 2,586 normal and 21 abnormal matches. The matched dividend data are shown in Fig. 2. On the x-axis, representing “Time,” a value was assigned to each time flow. The win-tie-loss dividend yield value is represented on the y-axis.

Before learning, we checked whether the dividend yield data and length of each match were irregular. For instance, there may be 50 data points for match A and 80 points for match B. In such a case, the difference in data dimensions hinders the model’s learning process. Therefore, data dimensions should be evened before learning. Given the average data length of 80 to 100, the length of every dividend datum was adjusted to 100 in an analyzable form before smoothing and implementation by adding a Sin value. Figure 3 shows the data dimension adjustment to 100 without changing the overall dividend yield graph pattern.

With the adjusted dimension of win-tie-loss dividend yield data, Fig. 4 represents an abnormal match during learning, with no change in a given dividend. As shown in Fig. 3, each dimension was adjusted to the win-tie-loss dividend yield data. When learning each of the win-tie-loss dividend yield, Fig. 4 represents an abnormal match with no change in a given dividend, even for an abnormal match. However, its loss pattern can be considered a normal match. Consequently, the learning model can be considered a normal match when three different patterns are applied simultaneously.

To address this problem, data sets on win-tie-loss with a length of 100 each were converted to frame a single data set of 300 in length. Figure 5 shows the result. Three types of dividend yields, shown in Fig. 4, were combined to form a pattern, which in turn emphasized the characteristics of data-deprived abnormal matches during learning.

3.4. Abnormal betting detection model

An abnormal match detection model was developed based on the data analysis results of each model, using the match result dividend data. Based on the betting pattern analysis results from the five methods, the abnormal betting detection model classified matches according to the number of abnormal matches as follows: one or less, normal; two, caution; three, danger; and four or more, abnormal. Figure 6 shows the model’s classification process, which provides a dividend pattern to help detect abnormal matches.

3.5. Data analysis

The present study proceeded with machine learning using five performant multiclass models: LR, RF, SVM, KNN, and the ensemble model, which was an optimized version of the previous four models. This was used to classify normal and abnormal matches by learning their pattern with sports dividend yield data. This study included 2,607 items (2,586 normal matches and 21 abnormal matches). The RF, KNN, and ensemble models recorded a high accuracy of over 92%, whereas the LR and SVM models were approximately 80% accurate (Table 2).

Table 2

Model results.
Model	Thresh	Accuracy	Sensitivity	Specificity
LR	0.5	0.817	0.857	0.817
LR	0.01	0.788	0.714	0.789
SVM	0.5	0.784	0.857	0.784
SVM	0.008	0.778	0.571	0.780
RF	0.5	0.932	0.429	0.937
RF	0.01	0.799	0.857	0.799
KNN	0.5	0.92	0.118	0.980
KNN	0.01	0.86	0.318	0.901
Ensemble	0.5	0.931	0.054	0.990
Ensemble	0.008	0.934	0.155	0.993

[Table 2 here]

Five models were tested using data from 20 matches (10 normal and 10 abnormal). K-league football matches and match-fixing cases between 2000 and 2020 were used as data sources. Based on the dividend yield of 20 matches, 8 out of 10 normal matches were deemed valid, while the remaining two matches were rated as “caution” in the LR, RF, and ensemble models. Out of 10 abnormal matches, 6 were valid, 2 were rated “caution,” and 2 were rated as “normal.” Regular dividend yield patterns in abnormal matches would have generated such a decision. Consequently, the model in the current study was 80% accurate for normal matches and 60% accurate for abnormal matches owing to the lack of abnormality data, which prevented the model from accurately estimating irrelevant results.

In the realm of sports, match-fixing issues tend to occur constantly and damage the fundamental value of fairness in sports. Various methods have been proposed to solve this problem. Efforts have been made in sports to build a match-fixing anomaly-detection model using match data.

In the academic world, studies have attempted to detect match-fixing using anomalous match data. Kim et al. [26] converted sports dividend yield data into graphs and applied the CNN algorithm to sort normal and abnormal matches by comparing their dividend yield graphs. Ötting et al. [24] used the GAMLSS model based on dividend yield and betting volume data to identify differences between fixed and non-fixed matches and evaluated the model’s ability to detect fixed matches.

Previous studies have examined suspected matches using a single model based on football match dividend yield data, with an accuracy rate of 70–80%. The misclassification rate was approximately 20%. However, inevitable biases and errors in single-model analyses hinder their practical application. Consequently, the current study aimed to suggest a solution to sports match-fixing using various AI models to detect anomalies based on dividend yields by constructing a database with such variables as sports match results, league ranking, and players.

To reduce errors in a single model, this study relied on four models frequently used in machine learning: LR, RF, SVM, and KNN classification. In addition, this study used the ensemble model, which is an optimized model of the previous four. Using these five models, this study aims to distinguish between normal and abnormal matches. The accuracy of the present results was higher than those in previous research for sorting matches, with three models (RF, KNN, and ensemble) showing an accuracy of over 90% and two (LR and SVM) models showing an accuracy of 80%. A combination of the models was used to identify suspicious matches, as each model suggests different suspicious cases, reducing the likelihood of considering valid matches as suspicious.

Moreover, after collecting data from real-time matches, we applied five models to construct a system capable of detecting match-fixing in real-time. The models are built on previous match data and collect real-time match data to ascertain fraudulent matches. Our study aimed to provide an environment for real-time analysis and investigation by building a system that collects real-time data before and during matches, decides whether a match is suspicious, and acts promptly.

However, previous research on data-based statistical detection of match-fixing revealed that match-fixing cases are relatively minor compared to normal matches [18], which the current study confirmed. Real-time data collection on sports matches could contribute to the creation of a more accurate detection system.

Determining whether a match is fixed cannot rely solely on abnormal patterns and data [22]. However, the detection model could help identify abnormal and normal matches in real-time and provide more detailed data to facilitate the investigation of match-fixing cases. Moreover, it could benefit the public, as these real-time data would detect match-fixing in games. Furthermore, the detection model may prevent match-fixing brokers and players from committing match-fixing, as they are aware of the risk of real-time detection. The results of this study could guide the future detection of match-fixing in sports.

Acknowledgments: The authors would like to thank all the practitioners and clubs for their time and support.

Author Contributions: Conceptualization, Lee. and Park.; methodology, Kim.; software, Lee. and Kim; validation, Kim., Park. and Lee.; formal analysis, Park.; investigation, Lee.; resources, Lee and Kim.; data curation, Kim.; writing—original draft preparation, Park.; writing—review and editing, Lee.; visualization, Kim.; supervision, Lee. and Park.; project administration, Park.; All authors have read and agreed to the published version of the manuscript.

Data Availability Statement: The data that support the study’s findings are available from iSports API (https://www.isportsapi.com/). The authors do not have permission to share the data. However, upon a reasonable request, the data can be made available for research purposes after the researchers get consent from iSports.

Funding: This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2020S1A5A2A03044544).

Conflicts of Interest: The authors declare no conflict of interest.

Corresponding author: Correspondence to Ji-Yong Lee

Renson, R. Fair play: Its origins and meanings in sport and society. Kinesiology 41, 5–18 (2009).
Weatherill, S. ‘Fair play please!’: Recent developments in the application of EC law to sport. Common Mark. Law Rev. 40, 51–93 (2003).
Cisneros, J. Leveling the e-sports playing field: An argument in favor of government regulation to ensure fair player contracts for young professional gamers in e-sports. Cal. W. L. Rev. 58, article 5 (2021).
Gonzalo-Skok, O., Sánchez-Sabaté, J., Izquierdo-Lupón, L. & Sáez de Villarreal, E. Influence of force-vector and force application plyometric training in young elite basketball players. Eur. J. Sport Sci. 19, 305–314 (2019). https://doi.org/10.1080/17461391.2018.1502357
Panchuk, D., Klusemann, M. J. & Hadlow, S. M. Exploring the effectiveness of immersive video for training decision-making capability in elite, youth basketball players. Front. Psychol. 9, article 2315 (2018). https://doi.org/10.3389/fpsyg.2018.02315
Loland, S. Caster Semenya, athlete classification, and fair equality of opportunity in sport. J. Med. Ethics 46, 584–590 (2020). https://doi.org/10.1136/medethics-2019-105937
Holden, J. T., McLeod, C. M., & Edelman, M. Regulatory categorization and arbitrage: How daily fantasy sports companies navigated regulatory categories before and after legalized gambling. Am. Bus. Law J. 57, 113–167 (2020). https://doi.org/10.1111/ablj.12156
Moriconi, M. & De Cima, C. Betting practices among players in Portuguese championships: From cultural to illegal behaviours. J. Gambl. Stud. 36, 161–181 (2020). https://doi.org/10.1007/s10899-019-09880-x
Cadwallader, A. B., de la Torre, X., Tieri, A. & Botrè, F. The abuse of diuretics as performance-enhancing drugs and masking agents in sport doping: Pharmacology, toxicology and analysis. Br. J. Pharmacol. 161, 1–16 (2010). https://doi.org/10.1111/j.1476-5381.2010.00789.x
Loland, S. Performance-enhancing drugs, sport, and the ideal of natural athletic performance. AJOB 18, 8–15 (2018). https://doi.org/10.1080/15265161.2018.1459934
Park, J.-H., Choi, C.-H., Yoon, J. & Girginov, V. How should sports match fixing be classified? Cog. Soc. Sci. 5, (2019). https://doi.org/10.1080/23311886.2019.1573595
Van der Hoeven, S., De Waegeneer, E., Constandt, B. & Willem, A. Match-fixing: Moral challenges for those involved. Ethics Behav. 30, 425–443 (2020). https://doi.org/10.1080/10508422.2019.1667238
Carpenter, K. Match-fixing—The biggest threat to sport in the 21st century? Int. Sports Law Rev. 2, 13–24 (2012).
Andreff, W. French professional football: How much different? In Handbook on the economics of professional football (eds. J. Goddard, & P. Sloane) 298–321 (Edward Elgar Publishing, 2014).
Rodenberg, R. & Feustel, E. D. Forensic sports analytics: Detecting and predicting match-fixing in tennis. J. Pred. Markets 8, 77–95 (2014). https://doi.org/10.5750/jpm.v8i1.866
Kim, Y.-W., Han, J. &? Choi, S.-R. Detection of possible match-fixing in tennis games. 6th Int. Cong. Sport Sci. Res. Technol. Support. https://www.scitepress.org/Papers/2018/69242/69242.pdf (2018).
Tak, M.; Sam, M. P. & Choi, C. H. Too much at stake to uphold sport integrity? High-performance athletes’ involvement in match-fixing. Crime Law Soc. Change 74, 27–44 (2020). https://doi.org/10.1007/s10611-020-09887-1
Lee, J.-Y., Park, J.-H., Yoon, J.-W. & Yun, H.-J. Detect on unexpected betting with monte-Carlo simulation: The relationship between the winning rate and sports odds of men’s professional basketball. Korean J. Meas. Eval. Phys. Educ. Sport Sci. 22, 55–56 (2020).
Dixon, M. J. & Coles, S. G. Modelling association football scores and inefficiencies in the football betting market. J. R. Stat. Soc. Series C-Appl. Stat. 46, 265–280 (1997). https://doi.org/10.1111/1467-9876.00065
Forrest, D. & McHale, I. G. Using statistics to detect match fixing in sport. IMA J. Manag. Math. 30, 431–449 (2019). https://doi.org/10.1093/imaman/dpz008
Archontakis, F. & Osborne, E. Playing it safe? A Fibonacci strategy for soccer betting. J. Sports Econ. 8, 295–308 (2007). https://doi.org/10.1177/1527002506286775
Van Rompuy, B. The odds of match fixing: Facts & figures on the integrity risk of certain sports bets. SSRN Electron. J. (2015). https://doi.org/10.2139/ssrn.2555037
Marchetti, F., Reppold Filho, A. R. & Constandt, B. At risk: Betting-related match-fixing in Brazilian football. Crime Law Soc. Change 76, 431–450 (2021). https://doi.org/10.1007/s10611-021-09971-0
Ötting, M., Langrock, R. & Deutscher, C. Integrating multiple data sources in match-fixing warning systems. Stat. Modelling 18, 483–504 (2018). https://doi.org/10.1177/1471082X18804933
Forrest, D. & McHale, I. G. Gambling and problem gambling among young adolescents in Great Britain. J. Gambl. Stud. 28, 607–622 (2012). https://doi.org/10.1007/s10899-011-9277-6
Kim, C., Park, J. H., Kim, D. & Lee, J. Y. Detectability of sports betting anomalies using deep learning-based ResNet: Utilization of K-League data in South Korea. Ann. Appl. Sport Sci. Article e1158 (2022).
Park, S. & Chang, Y.-C. The ethical sensitivity level of domestic badminton athletes for match-fixing. Sports Sci. 39, 395–402 (2021). https://doi.org/10.46394/ISS.39.3.45

No competing interests reported.

Download PDF

Journal Publication

published 18 Mar, 2024

Read the published version in Scientific Reports →

Editorial decision: Major revision
04 Jul, 2023
Reviews received at journal
01 Jul, 2023
Reviewers agreed at journal
29 Jun, 2023
Reviewers agreed at journal
15 May, 2023
Reviews received at journal
03 May, 2023
Reviewers agreed at journal
24 Apr, 2023
Reviewers invited by journal
22 Apr, 2023
Editor assigned by journal
22 Apr, 2023
Editor invited by journal
18 Apr, 2023
Submission checks completed at journal
18 Apr, 2023
First submitted to journal
10 Apr, 2023

You are reading this latest preprint version

AI-Based Betting Anomaly Detection System to Ensure Fairness in Sports and Prevent Illegal Gambling

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction

2. Literature Review

2.1. Market risks of match-fixing

2.2. Detection of behaviors of athletes and those involved in match-fixing

2.3. Match-fixing detection using betting dividend yields and market price figures

3. Materials And Methods

3.1. Sport database

3.2. Betting models

3.2.1. Support vector machine

3.2.2. Random forest

3.2.3. Logistic regression

3.2.4. K-nearest neighbor

3.3. Data preprocessing

3.4. Abnormal betting detection model

3.5. Data analysis

4. Results

5. Discussion And Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1

\({\beta }_{0},{\beta }_{1},\dots ,{\beta }_{p},{ϵ}_{1},\dots ,{ϵ}_{n}MMaximizeM\)	(4)
\(subject to {\sum }_{j=1}^{p}{\beta }_{j}^{2}=1\)	(5)
\({Y}_{i}\left({\beta }_{0}+{\beta }_{1}{X}_{i1}+\dots +{\beta }_{p}{X}_{ip}\right)\ge M(1-{ϵ}_{1})\)	(6)
\({ϵ}_{1}\ge 0, {\sum }_{i=1}^{n}{ϵ}_{i}\le C\)	(7)