A Detailed Study of Robust Generative Models Adversarial Attacks and their Implications

Jivesh Nage; Mangesh Sawarkar; Goldi Soni

Authors

Jivesh Nage
Mangesh Sawarkar
Goldi Soni

Keywords:

Adversarial attacks, Generative models, Latent space attacks, Model poisoning, Robustness in AI

Abstract

The need for robustness to adversarial attacks is very significant in the case of generative models, as they are employed in a variety of applications, including image synthesis, text generation, and data augmentation. In this respect, we exploit vulnerabilities in the design of generative models-Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), autoregressive models, and diffusion models. We further classify the different types of adversarial attacks, which include input attacks, latent space attacks, and model poisoning. Using broad methodologies including gradient-based approaches and optimization-based procedures, we discuss each method in detail. The causes of these kinds of attacks, from degradation of output quality to security breaches, explain why understanding and mitigating adversarial risks is essential. We introduce existing defense mechanisms, such as adversarial training, regularization techniques, and robust model architectures, and examine their effectiveness and limitations. Discussion of ongoing challenges in the conclusion includes generalization of defenses across attack types and model architectures and, as a final note, outlines various open research questions that need to be addressed for improving the resilience of generative models to adversarial threats. This work contributes to the ongoing discussion about the improvement of safety and reliability of real-world generative models.

References

I. J. Goodfellow ,J. Pouget-Abadie, M. Mirza, B. Xu, “Generative Adversarial Nets,” Advances in Neural Information Processing Systems, vol. 27, 2025, Available: https://papers.nips.cc/paper_files/paper/2014/hash/f033ed80deb0234979a61f95710dbe25-Abstract.html

D. P. Kingma and M. Welling, “Auto-Encoding Variational Bayes,” arXiv (Cornell University), Dec. 2013, doi: https://doi.org/10.48550/arxiv.1312.6114.

N. Carlini and D. Wagner, "Towards Evaluating the Robustness of Neural Networks," 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 2017, pp. 39-57, doi: https://doi.org/10.1109/SP.2017.49

N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik and A. Swami, "The Limitations of Deep Learning in Adversarial Settings," 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 2016, pp. 372-387, doi: https://doi.org/10.1109/EuroSP.2016.36.

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” arXiv (Cornell University), Mar. 2017, doi: https://doi.org/10.48550/arxiv.1703.10593.

Y. Song and S. Ermon, “Generative Modeling by Estimating Gradients of the Data Distribution,” arXiv (Cornell University), Jul. 2019, doi: https://doi.org/10.48550/arxiv.1907.05600

P. Chanakya, P. Harsha and K. P. Singh, "Robustness of Generative Adversarial CLIPs Against Single-Character Adversarial Attacks in Text-to-Image Generation," in IEEE Access, vol. 12, pp. 162551-162563, 2024, doi: https://doi.org/10.1109/ACCESS.2024.3491017.

C. Xiao, B. Li, J.-Y. Zhu, W. He, M. Liu, and D. Song, “Generating Adversarial Examples with Adversarial Networks,” arXiv (Cornell University), Jan. 2018, doi: https://doi.org/10.48550/arxiv.1801.02610.

H. Sun, T. Zhu, Z. Zhang, D. Jin, P. Xiong and W. Zhou, "Adversarial Attacks Against Deep Generative Models on Data: A Survey," in IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 4, pp. 3367-3388, 1 April 2023, doi: https://doi.org/10.1109/TKDE.2021.3130903

F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. D. McDaniel, “Ensemble Adversarial Training: Attacks and Defenses,” arXiv (Cornell University), Jan. 2017, doi: https://doi.org/10.48550/arxiv.1705.07204.

A Detailed Study of Robust Generative Models Adversarial Attacks and their Implications

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Current Issue