Bibliographic Details
Title: |
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing |
Authors: |
Li, Yaowei, Li, Lingen, Zhang, Zhaoyang, Li, Xiaoyu, Wang, Guangzhi, Li, Hongxiang, Cun, Xiaodong, Shan, Ying, Zou, Yuexian |
Publication Year: |
2025 |
Collection: |
Computer Science |
Subject Terms: |
Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Multimedia |
More Details: |
Element-level visual manipulation is essential in digital content creation, but current diffusion-based methods lack the precision and flexibility of traditional tools. In this work, we introduce BlobCtrl, a framework that unifies element-level generation and editing using a probabilistic blob-based representation. By employing blobs as visual primitives, our approach effectively decouples and represents spatial location, semantic content, and identity information, enabling precise element-level manipulation. Our key contributions include: 1) a dual-branch diffusion architecture with hierarchical feature fusion for seamless foreground-background integration; 2) a self-supervised training paradigm with tailored data augmentation and score functions; and 3) controllable dropout strategies to balance fidelity and diversity. To support further research, we introduce BlobData for large-scale training and BlobBench for systematic evaluation. Experiments show that BlobCtrl excels in various element-level manipulation tasks while maintaining computational efficiency, offering a practical solution for precise and flexible visual content creation. Project page: https://liyaowei-stu.github.io/project/BlobCtrl/ Comment: Project Webpage: https://liyaowei-stu.github.io/project/BlobCtrl/ |
Document Type: |
Working Paper |
Access URL: |
http://arxiv.org/abs/2503.13434 |
Accession Number: |
edsarx.2503.13434 |
Database: |
arXiv |