Methods for Ensuring Consistent Generation in Diffusion Models
DOI:
https://doi.org/10.31649/1997-9266-2024-175-4-75-85Keywords:
deep learning, image generation, generative diffusion models, generation consistency, conceptual consistencyAbstract
The article investigates the problem of consistent generation in diffusion models. Modern generative diffusion models are capable of creating high-precision images, but maintaining the consistency between the related generation results remains a challenging task. The key methods for ensuring generation consistency are analyzed. Additionally, a new type of consistency is introduced — conceptual consistency, which allows for assessing the models’ ability not only to reproduce existing styles and objects but also to generate entirely new visual ideas that the model has never encountered during training. The existing methods for ensuring consistency are analyzed, and their advantages and disadvantages are identified. The image-to-image generation method based on an input reference image has the advantage of simplicity in implementation. Fine-tuning methods like DreamBooth and LoRA DreamBooth provide broader control over object consistency. ControlNet models ensure shape consistency using a special input image that serves as a guide shape in the reverse diffusion process. Noise inversion methods allow for more precise control and iterative refinement of the resulting images through manipulations with the noise space, enabling the generation of more stylistically and conceptually consistent images. The StyleAligned method, using a shared attention mechanism, can ensure the stylistic consistency of generated images. Understanding the capabilities and limitations of methods for ensuring diffusion generation consistency allows for selecting the most effective set of tools according to the task at hand. Diffusion models continue to evolve and expand into new areas, so achieving reliable and universal consistency in diffusion models could pave the way for even more creative and effective solutions.
References
Chenshuang Zhang, Chaoning Zhang, et al., “Text-to-image Diffusion Models in Generative AI: A Survey,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2303.07909 . Accessed on: April 29, 2024.
Dustin Podell, Zion English, et al., “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 . Accessed on: April 29, 2024.
Ling Yang, Zhilong Zhang, et al., “Diffusion Models: A Comprehensive Survey of Methods and Applications,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2209.00796 . Accessed on: April 29, 2024.
Omri Avrahami, Amir Hertz, et al., “The Chosen One: Consistent Characters in Text-to-Image Diffusion Models,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2311.10093 . Accessed on: April 29, 2024.
Jonathan Ho, Ajay Jain, and Pieter Abbeel, “Denoising Diffusion Probabilistic Models,” in arXiv e-prints, 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 . Accessed on: April 29, 2024.
Yong-Hyun Park, Mingi Kwon, et al., “Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2307.12868 . Accessed on: April 29, 2024.
Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in arXiv e-prints, 2015. [Online]. Available: https://arxiv.org/abs/1505.04597 . Accessed on: April 29, 2024.
Diederik P. Kingma, Max Welling, et al., “An Introduction to Variational Autoencoders,” in arXiv e-prints, 2019. [Online]. Available: https://arxiv.org/abs/1906.02691 . Accessed on: April 29, 2024.
Fan Judith. E., Bainbridge Wilma. A., et al, “Drawing as a versatile cognitive tool,” Nature Reviews Psychology, 2023. https://doi.org/10.1038/s44159-023-00212-w .
G. Greenberg, “Semantics of pictorial space,” Springer Link, 2021. https://doi.org/10.1007/s13164-020-00513-6 .
Gihyun Kwon, and Jong Chul Ye, “Diffusion-based Image Translation using Disentangled Style and Content Representation,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2209.15264 . Accessed on: April 29, 2024.
Aaron Hertzmann, “Toward a theory of perspective perception in pictures,” Journal of Vision, 2024. https://doi.org/10.1167/jov.24.4.23 .
Chenlin Meng, Yutong He, et al., “SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/abs/2108.01073 . Accessed on: April 29, 2024.
Nataniel Ruiz, Yuanzhen Li, et al., “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2208.12242. Accessed on: April 29, 2024.
Edward J. Hu, Yelong Shen, et al., “LoRA: Low-Rank Adaptation of Large Language Models,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 . Accessed on: April 29, 2024.
Lvmin Zhang, Anyi Rao, et al., “Adding Conditional Control to Text-to-Image Diffusion Models,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2302.05543. Accessed on: April 29, 2024.
Ron Mokady, Amir Hertz, et al., “Null-text Inversion for Editing Real Images using Guided Diffusion Models,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2211.09794 . Accessed on: April 29, 2024.
Inbar Huberman-Spiegelglas, Vladimir Kulikov, et al., “An Edit Friendly DDPM Noise Space: Inversion and Manipulations,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2304.06140 . Accessed on: April 29, 2024.
Amir Hertz, Andrey Voynov, et al., “Style Aligned Image Generation via Shared Attention,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2312.02133. Accessed on: April 29, 2024.
Xun Huang, Serge Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization,” in arXiv e-prints, 2017. [Online]. Available: https://arxiv.org/abs/1703.06868 . Accessed on: April 29, 2024.
Downloads
-
pdf (Українська)
Downloads: 31
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).