Current AI systems have shown impressive results in the Automatic synthesis of high-resolution realistic images from texts descriptions. Specifically, Generative Adversarial Networks (GANs) as a powerful technology that utilizes computer vision tools to create two models, the Generator that generates realistic images and the discriminator that distinguishes whether the images synthesized are real or fake. Further, Most text-to-image generation frameworks leverage the power of GANs to generate realistic images conditioned with texts descriptions. In this paper, we fuse a sample and efficient text-to-image generation framework called DF-GAN and AraBERT architecture to generate images conditioned with Arabic texts descriptions. Firstly, we recreate new datasets matching the Arabic text-to-image generation task by applying super translation using the DeepL-Translator from English to Arabic on texts descriptions. Secondly, we leverage the power of AraBERT which is trained on billions of Arabic words to produce a strong sentence embedding, and we reduce that vector’s dimension to match with DF-GAN shape. Thirdly, we inject the reduced sentence embedding into the DF-GAN framework to generate high-resolution realistic, and text-matching images conditioned with Arabic texts descriptions. Such as in previous work, we use CUB and Oxford-102 flowers as original datasets. Further, we measure our framework with FID and IS. Our framework is the first that achieve much success in generating high-resolution realistic and text matching images conditioned with Arabic text.