Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Bibliographic Details
Title:	Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
Authors:	Liu, Dongyang, Li, Shicheng, Liu, Yutong, Li, Zhen, Wang, Kai, Li, Xinyue, Qin, Qi, Liu, Yufei, Xin, Yi, Li, Zhongyu, Fu, Bin, Si, Chenyang, Cao, Yuewen, He, Conghui, Liu, Ziwei, Qiao, Yu, Hou, Qibin, Li, Hongsheng, Gao, Peng
Publication Year:	2025
Collection:	Computer Science
Subject Terms:	Computer Science - Computer Vision and Pattern Recognition
More Details:	Recent advancements have established Diffusion Transformers (DiTs) as a dominant framework in generative modeling. Building on this success, Lumina-Next achieves exceptional performance in the generation of photorealistic images with Next-DiT. However, its potential for video generation remains largely untapped, with significant challenges in modeling the spatiotemporal complexity inherent to video data. To address this, we introduce Lumina-Video, a framework that leverages the strengths of Next-DiT while introducing tailored solutions for video synthesis. Lumina-Video incorporates a Multi-scale Next-DiT architecture, which jointly learns multiple patchifications to enhance both efficiency and flexibility. By incorporating the motion score as an explicit condition, Lumina-Video also enables direct control of generated videos' dynamic degree. Combined with a progressive training scheme with increasingly higher resolution and FPS, and a multi-source training scheme with mixed natural and synthetic data, Lumina-Video achieves remarkable aesthetic quality and motion smoothness at high training and inference efficiency. We additionally propose Lumina-V2A, a video-to-audio model based on Next-DiT, to create synchronized sounds for generated videos. Codes are released at https://www.github.com/Alpha-VLLM/Lumina-Video.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2502.06782
Accession Number:	edsarx.2502.06782
Database:	arXiv

More Details
Description not available.