2.1 Overview of LLM models
Generative Pre-trained Transformers, which are known as GPTs, are a series of multimodal which have advanced forms of Artificial Intelligence that are designed to understand and generate responses as Humans do. The model has been used in various Domains like translation, content writing, chatbots, image generation, and video generation.
2.1.1 GPT based models
The detection of fake content or misinformation is created by large language models like GPT. The paper (Chen, et al., 2023) discusses whether AI-generated misinformation or fake news was more harmful than human-generated misinformation written by humans. It has been seen that LLM-generated misinformation is more the difficult to detect and investigate. The study also found that AI-generated misinformation was more deceptive and difficult to detect, posing a significant challenge to tackle online safety and public trust websites. “GPT-who”
(Venkatraman, et al., 2023)
, an LLM-generated text detector based on Uniform information Density (UID), suggested that human generally distribute their information or knowledge evenly during their communication. This concept is used in “GPT-who” to detect the difference between human-generated and AI-generated. The paper claims that it performs better than another state-of-the-art detecting model by around 20% on many performance metrics and provides more interpretable representations of the text.
2.1.2 Gemini model
Gemini is an advanced chatbot model built by Google DeepMind. The author (Saeidnia & Hamid, 2023) examined the motivation behind the development of Google’s chatbot Gemini and its potential impact on IT industry. Gemini is designed to provide a better user experience by personalized and relevant information and improving customer services. Model built to transform how users access the internet and information and interact with it. We used Gemini’s API to generate text and also used it for the paraphrasing of our dataset.
2.1.3 Necessity of LLM generated text detection
The impact of the LLM model on academic and student learning has been seen in higher education through a content analysis of 100 articles from Australia, New Zealand, the United States, and the United Kingdom (Sullivan, et al., 2023). It explores various benefits and risks associated with using generative AI tools and the need to change teaching style. The assignment may need to adapt to this new reality as the performance of ChatGPT compared to humans in various expert domains is getting better day by day shown in
(Guo, et al., 2023)
. The study
(Khalil, et al., 2023)
evaluates LLMs’ ability to understand the theory of mind task. It suggests that as LLMs improve, their language skills may lead to the emergence of human-like cognitive abilities as discussed by (Junchao and Yang, et al., 2023) , emphasize the importance of building detectors for LLM-generated text from potential misuse and protecting different domains like artistic expression, social network. It discusses recent advancements in detection techniques and the challenges faced, such as out-of-distribution problems and data ambiguity. It is not only necessary to detect generated text but also to detect hallucinations or unreliable answers in LLMs.
(Chen, et al., 2023)
and (Guo, et al., 2023) talk about how LLM could be deceived to generate false information while forcing on that LLM can also make statistical mistakes and how can they prove to be costly in many fields. The need for a robust model to discern reliable information provided by the LLM model is critical for their safe and effective use.
2.1.4 Challenges in detection of LLM generated text
Identifying text written by Large Language Models provided several challenges for researchers and practitioners. First, the out-of-distribution problem results in false negatives because detection models cannot identify a novel text, meaning a text that was not part of their training data. In addition, the text could be deceptive in style (Chen, et al., 2023)similar to human prose, which makes the machine-based system and humans difficult to detect that the text was written by the machine. Ultimately, this becomes more challenging and deceiving if production of misinformation is produced by LLM.
Another challenge, and restriction was the ineffectiveness of traditional plagiarism systems (Khalil, et al., 2023) they were unable to detect LLM-generated materials because these were new and original. In this case, adversaries could be detecting a mistake in a text and trick the detection system into thinking that it was not abductive hence, the system misclassifies them because they could classify most of them as shown in
(Pu, et al., 2023)
. The limitation in (Guo, et al., 2023) is the high difficulty of measuring the LLM generated text using logistic regression. This was difficult because the content generated by LLMs was complex and confusing therefore, understanding requires profound insights into critical factors that influence performance and the ability to adapt to these factors in various contexts. These challenges necessitate ultra-efficient detection methods that can adapt to the growing capabilities of LLMs.
2.2 Text detection algorithms
2.2.1 Double stream models
OUTFOX
(Koike, et al., 2024)
frame has been used to improve the detection by having detector and attacker model and iteratively learn from each other’s outputs through in-context learning. The detector uses human-generated text (essays), attacker-generates texts, and regular LLM’s generated texts as in-context examples to learn to detect the attacker’s generated texts. The attacker used the detector’s outputs to learn adversarial to generated text which was not easily detected by the detector. They constructed a dataset of 15,400 triplets of essay prompts, human-written essays, and LLM-generated essays to train and evaluate the model.
In recent work, (Abburi, et al., 2023) investigated the task of detecting AI-generate text and attributed it to the large language model. They proposed an ensemble that involved passing input text through pre-trained large language models like BERT, RoBERTs, and DeBERTa, etc., and using the output probabilities from the model as features input for traditional machine learning models for binary classifiers such as SVM and Logistic regression. For the detection task, the ensemble approach achieved a micro F1score of 0.733 on English data and 0.649 on Spanish data. The approach performed well on the multiclass task of attributing LLM-generated text to one of six language models, ranking 1st in both micro F1 0.625 in English and micro F1 0.653 in Spanish.
2.2.2 Multi feature detection
The concept of multi-feature detection in
(Wu, et al., 2023)
, was a complex approach to improve various fields in the accuracy and reliability of the zero-shot detection model. Multi-feature detection model integrates various data attributes and multiple sources using advanced algorithms to analyse and find complex patterns. The zero-shot model used different mathematics approaches like log-likelihood, log-rank, and entropy for better interpretation. This approach was particularly favourable in scenarios where a single feature model may fail or not be enough, such as in a medical field where patients have multiple points and must be considered or in Cybersecurity, where a wide range of indicators need to be analysed to find the thread.
2.2.3 Adversarial learning
RADAR (Hu, et al., 2023)aims to identify machine-generated text using a method called learning with adversarial networks (GANs). As a classifier RADAR utilizes a diversity-enhancing GAN that employs a unidirectional LSTM as the discriminator. This unique approach incorporates rewards at both the word and sentence levels during training treating text generation as a decision-making process. It considers the generated text as the state and the next word to be generated as an action utilizing Monte Carlo searches to gather feedback from the discriminator. Conversely, OUTFOX (Koike, et al., 2024)enhances detection by employing both a detector and an attacker model that learn from each other through in-context learning. The detector learns to distinguish between human-generated texts (essays) created by attackers and regular language model-generated texts using examples. In contrast, the attacker leverages feedback, from the detector to create text that evades detection.
The authors (Mitchell, et al., 2023) introduced a novel method to detect LLM’s generated text. The authors found a distinctive property of LLM’s probability function to detect. LLM-generated text from the sample generally made negative curvature with the model’s log probability function and they used this insight, to develop a curvature base criterion called “DetectGPT”. It did not require any separate classifier or training dataset to build the model. It only used the log probability function to compute from another pre-trained language model and the result showed that DetectGPT performed better than the existing Zero-shot model. DetectGPT achieved 0.95 AUROC compared to Zero-shot 0.81 AUROC.
2.2.4 Comparative analysis of different multilingual models
A recent study by (Orenstrakh, et al., 2023)explored a method to detect the Large language model in the context of academic integrity. For this, the author compiled a dataset of student text along with ChatGPT-generated text and used datasets with widespread powerful large language models like CopyLeaks, GPTKit, GLTR, and GPTZero to do comparative analysis for detection. The most accurate detector was CopyLeaks shown in Table 1 which had the highest precision in identifying the LLM-generated text. GPTKit was for effectively reduced the false positive and GLTR was the most resilient one. However, the study also highlighted concerns regarding the high false positive rates of GPTZero as well as the reduced accuracy of all detectors when faced with code, non-English content, and paraphrased submissions.
Table (I): Overall accuracy of LLM – generated text detectors
Detectors
|
Human Data
|
ChatGPT Data
|
CopyLeaks
|
99.12%
|
95.00%
|
GPT2Detector
|
98.25%
|
95.00%
|
CheckForAI
|
98.25%
|
95.00%
|
GLTR
|
82.46%
|
95.00%
|
GPTKit
|
100.00%
|
75.00%
|
OriginalityAI
|
93.86%
|
70.00%
|
AI Text Classifier
|
94.74%
|
60.00%
|
GPTZero
|
54.39%
|
45.00%
|