In the ever-evolving landscape of large language models (LLMs), the crux of the matter lies in the training dynamics of models like GPT-3, GPT-4, Llama, Palm, and Bard, where they are immersed in datasets that intricately involve benchmark datasets. This training nuance poses a significant challenge during evaluation, as these LLMs often showcase inflated accuracy levels, a direct consequence of their profound familiarity with benchmark data. This research, thus, ventures into the heart of the matter – the critical issue of data contamination within these influential LLMs. The central quest is to discern whether the demonstrated proficiency of these models genuinely reflects a nuanced understanding of language or if it stems from an overreliance on benchmark datasets during their formative training phase. The primary aim is a meticulous investigation, seeking to unfurl the layers and determine the extent of data contamination in these models. To achieve this, our methodology unfolds in three distinct acts: direct questioning of instances within the training dataset, quiz-like inquiries that probe comprehension beyond rote memorization, and the presentation of partial information for predictive completion. Through these methodical explorations, this study aspires to illuminate the complex terrain of potential contamination in popular LLMs, contributing to a deeper comprehension of their capabilities and limitations within the intricate domain of natural language processing.