Generative Adversarial Networks (GANs) and Diffusion-based Text-to-Image Synthesis Framework for Enhancing English Vocabulary Learning in Early Childhood Education

Authors

  • L. S. S. T. Dharmarathna
  • L. P. T. Pavith
  • S. C. Ranaweera
  • O. G. Yugani Navodya Gamlath

Keywords:

Chatbots, Diffusion models, Generative adversarial networks, Large language models, Text-to-image

Abstract

This study presents the development of a generative adversarial network (GAN) based text-to-image synthesizer utilizing both adversarial and diffusion techniques. In fact, in order to transform natural language into visually meaningful images, this system uses the power of GANs and a diffusion model in combination to generate diverse and high-quality images. Moreover, a large language model (LLM) based chatbot is integrated into the system, allowing users to interactively work on the image generation process. A web-based application allows users to interact with the system, aiding them in developing conversations with the chatbot, and to receive generated images. This project overcomes the limitations of existing text-to-image generation methods by combining the strengths of GANs, diffusion models, and LLMs. The study also describes the underlying methodology of the complete project, covering the text-to-image model, chatbot integration, and web application. Finally, the results are presented with a discussion of potential future research directions.

References

N. D. Truong, L. Kuhlmann, M. R. Bonyadi, D. Querlioz, L. Zhou, and O. Kavehei, “Epileptic seizure forecasting with generative adversarial networks,” IEEE Access, vol. 7, pp. 143999–144009, 2019, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8853232

T. Hinz, S. Heinrich, and S. Wermter, “Semantic object accuracy for generative text-to-image synthesis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, 2022, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9184960

R. Yanagi, R. Togo, T. Ogawa, and M. Haseyama, “Query is GAN: Scene retrieval with attentional text-to-image generative adversarial network,” IEEE Access, vol. 7, 2019, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8868179

Y. Cai, X. Wang, Z. Yu, F. Li, P. Xu, Y. Li, and L. Li, “Dualattn-GAN: Text to image synthesis with dual attentional generative adversarial network,” IEEE Access, vol. 7, 2019, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8930532

M. Z. Hossain, F. Sohel, M. F. Shiratuddin, H. Laga, and M. Bennamoun, “Text to image synthesis for improved image captioning,” IEEE Access, vol. 9, pp. 64918–64928, 2021, Available: https://researchportal.murdoch.edu.au/esploro/outputs/journalArticle/Text-to-image-synthesis-for-improved/991005543313607891/filesAndLinks?index=0

Z. Zhang, W. Yu, J. Zhou, X. Zhang, N. Jiang, G. He, and Z. Yang, “Customizable GAN: A method for image synthesis of human controllable,” IEEE Access, vol. 8, 2020, https://ieeexplore.ieee.org/document/9112217

M. A. Habib et al., “GACnet-text-to-image synthesis with generative models using attention mechanisms with contrastive learning,” IEEE Access, vol. 12, pp. 9572–9585, 2024, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10360129

T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1316–1324, Available: https://arxiv.org/abs/1711.10485

G. Goh, L. Jing, T. Brooks, J. Wang, L. Li, L. Ouyang, J. Zhuang, J. Lee, Y. Guo, W. Manassra, P. Dhariwal, C. Chu, Y. Jiao, and A. Ramesh. Improving image generation with better captions, 2023, Available: https://cdn.openai.com/papers/dall-e-3.pdf

Z. Pan, W. Yu, X. Yi, A. Khan, F. Yuan, and Y. Zheng, “Recent progress on generative adversarial networks (GANs): A survey,” IEEE Access, vol. 7, pp. 36322–36333, 2019, Available: https://www.researchgate.net/publication/331756737_Recent_Progress_on_Generative_Adversarial_Networks_GANs_A_Survey

W. Wu, Z. Li, Y. He, M. Zheng Shou, C. Shen, L. Cheng, Y. Li, T. Gao, D. Zhang, and Z. Wang, “Paragraph-to-image generation with information-enriched diffusion model,” 2023, arXiv:2311.14284, Available: https://dl.acm.org/doi/abs/10.1007/s11263-025-02435-1

Y. X. Tan, C. P. Lee, M. Neo, K. M. Lim, J. Y. Lim, and A. Alqahtani, “Recent advances in text-to-image synthesis: Approaches, datasets and future research prospects,” IEEE Access, vol. 11, pp. 88099–88115, 2023, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10224242

J. Ni, S. Zhang, Z. Zhou, J. Hou, and F. Gao, “Instance mask embedding and attribute-adaptive generative adversarial network for text-to-image synthesis,” IEEE Access, vol. 8, 2020, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9007390

Alsmadi et al., “Adversarial machine learning in text processing: A literature survey,” IEEE Access, vol. 10, pp. 17043–17077, 2022, Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9693527-

M. Z. Khan et al., “A realistic image generation of face from text description using the fully trained generative adversarial networks,” IEEE Access, vol. 9, pp. 1250–1260, 2021, Available: https://www.kresttechnology.com/krest-academic-projects/krest-mtech projects/CSE/M.Tech%20Computer%20Science%202020/Artificial%20Intelligence/Basepaper

Y. Liang et al., “Rich human feedback for text-to-image generation,” 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, pp. 19401–19411, Available: https://openaccess.thecvf.com/content/CVPR2024/papers/Liang_Rich_Human_Feedback_for_Text-to-Image_Generation_CVPR_2024_paper.pdf

Published

2025-12-08

Issue

Section

Articles