As Large Language Models (LLMs) usage becomes more widespread, it is essential to ensure the security of its responses against adversarial attacks. These attacks manipulate LLMs to produce harmful or misleading information, posing significant risks in several applications. Therefore, this paper investigates the capability of using LLMs as a safety judge of another LLM's response. The proposed architecture consists of generators, evaluators, and a judge. Four LLM generators have been used to generate responses to a user-provided question. The questions and the responses are then provided to the evaluators. The evaluators scale each response based on its safety level. Finally, the judge selects the best response based on the evaluators' output. This architecture has been used to test two approaches: one with a single LLM evaluator and another with four fine-tuned specialized LLMs. The pipeline achieved an 88% accuracy using the one-evaluator approach and an 83% accuracy using the four-evaluator approach. The results prove the potential effectiveness of LLMs as a judge of other LLM-generated responses and offer a promising direction for enhancing the reliability and security of AI systems in adversarial environments.