Title: |
Counterfactual GAN for debiased text-to-image synthesis: Counterfactual GAN for debiased text-to-image synthesis: X. Kong et al. |
Authors: |
Kong, Xianghua1 (AUTHOR), Xu, Ning2 (AUTHOR) ningxu@tju.edu.cn, Sun, Zefang2 (AUTHOR), Shen, Zhewen2 (AUTHOR), Zheng, Bolun3 (AUTHOR), Yan, Chenggang3 (AUTHOR), Cao, Jinbo4 (AUTHOR), Kang, Rongbao4 (AUTHOR), Liu, An-An2 (AUTHOR) |
Source: |
Multimedia Systems. Feb2025, Vol. 31 Issue 1, p1-11. 11p. |
Abstract: |
Text-to-image generation (T2I) is a complex task to produce images that accurately reflect the content and meaning of the provided textual descriptions. Essentially, not every word in the sentence has a corresponding visual appearance. For example, the non-visual words “are” and “a” do not have corresponding visual regions, whereas the visual words “men” and “basketball” are crucial to generating visual regions. Existing methods treat all words equally to produce images, without distinguishing between visual and non-visual words. Gradients from non-visual words can introduce harmful bias that diminishes the total effect of textual guidance in generating image content. This paper presents a new debiased text-to-image synthesis method with a counterfactual learning scheme. First, we build a causal graph for text-to-image synthesis, which stands in the cause–effect interpretation of visual and non-visual words. Second, we treat the non-visual words as confounders that introduce misleading associations during the image generation process. To mitigate the influence of non-visual words, we conceptualize the process as calculating the Natural Direct Effect on the generated visual features and use the Total Indirect Effect to produce the final image for debiased text-to-image generation. By using our method on several existing models, significant improvements can be achieved in two prevailing datasets. [ABSTRACT FROM AUTHOR] |
|
Copyright of Multimedia Systems is the property of Springer Nature and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
Database: |
Academic Search Complete |