A Fully Trained GAN-Based System for Generating Human Face Images from Text Descriptions
Keywords:
BiLSTM, CelebA, Deep learning, Generative adversarial network, Image synthesis, Text-to-face generationAbstract
The sub-domain of text-to-image synthesis, known as text-to-face creation, has a wide range of applications in the disciplines of public safety, forensics, and developing research fields. On the other hand, development in this field has been hampered by the inadequate datasets available. In this study, a fully trained Generative Adversarial Network (GAN) system is presented. This system trains both the text encoder and the picture decoder concurrently, which results in results that are more accurate and efficient. Through various experiments, the proposed system demonstrates superior image generation quality based on input text descriptions. The study also introduces a hybrid dataset created by combining the CelebA dataset with a locally prepared dataset, which enhances the training process.
References
H. Zhang, T. S. Zhang, X. Wang, “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks,” Arxiv (Cornell University), Dec. 2016, doi: https://doi.org/10.48550/arxiv.1612.03242.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” Arxiv (Cornell University), Mar. 2017, doi: https://doi.org/10.48550/arxiv.1703.10593.
I. J. Goodfellow et al., “Generative Adversarial Networks,” Arxiv (Cornell University), vol. 1, Jun. 2014, doi: https://doi.org/10.48550/arxiv.1406.2661
“The Caltech-UCSD Birds-200-2011 Dataset,” authors.library.caltech.edu. https://authors.library.caltech.edu/records/cvm3y-5hh21
M. Tarasiou, S. Moschoglou, J. Deng, and S. Zafeiriou, “Improving face generation quality and prompt following with synthetic captions,” Arxiv.org, 2024. https://arxiv.org/abs/2405.10864 .
S. Hong, D. Yang, J. Choi, and H. Lee, “Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis,” Arxiv (Cornell University), Jan. 2018, doi: https://doi.org/10.48550/arxiv.1801.05091.
A. Van Den, O. Deepmind, N. Kalchbrenner, “Conditional Image Generation with PixelCNN Decoders,” 2016. Available: https://proceedings.neurips.cc/paper_files/paper/2016/file/b1301141feffabac455e1f90a7de2054-Paper.pdf
T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” arXiv (Cornell University), May 2014, doi: https://doi.org/10.48550/arxiv.1405.0312
D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” Arxiv (Cornell University), Dec. 2013, doi: https://doi.org/10.48550/arxiv.1312.6114
H. Dong, S. Yu, C. Wu, and Y. Guo, “Semantic Image Synthesis via Adversarial Learning,” Arxiv (Cornell University), Jan. 2017, doi: https://doi.org/10.48550/arxiv.1707.06873
J. Yang, H. Liu, J. Xin and Y. Zhang, "Text-guided image generation based on ternary attention mechanism generative adversarial network," 2024 7th International Symposium on Autonomous Systems (ISAS), Chongqing, China, 2024, pp. 1-6, doi: https://doi.org/10.1109/ISAS61044.2024.10552390
H. Zhang, “StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks,” Arxiv (Cornell University), Oct. 2017, doi: https://doi.org/10.48550/arxiv.1710.10916.
M. -E. Nilsback and A. Zisserman, "Automated Flower Classification over a Large Number of Classes," 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, Bhubaneswar, India, 2008, pp. 722-729, doi: https://doi.org/10.1109/ICVGIP.2008.47
S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative Adversarial Text to Image Synthesis,” Arxiv.Org, 2016. https://arxiv.org/abs/1605.05396
Deepanshu Koli, Anmol Singal, A. Goel, V. Bahl, and Ms. Nidhi Sengar, “Exploring Generative Adversarial Networks for Face Generation,” International Journal For Multidisciplinary Research, vol. 6, no. 3, May 2024, doi: https://doi.org/10.36948/ijfmr.2024.v06i03.20002