Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Bibliographic Details
Title: Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model
Authors: Chen, Yida, Viégas, Fernanda, Wattenberg, Martin
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
More Details: Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process$-$well before a human can easily make sense of the noisy images. Intervention experiments further indicate these representations play a causal role in image synthesis, and may be used for simple high-level editing of an LDM's output. Project page: https://yc015.github.io/scene-representation-diffusion-model/
Comment: A short version of this paper is accepted in the NeurIPS 2023 Workshop on Diffusion Models: https://nips.cc/virtual/2023/74894
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2306.05720
Accession Number: edsarx.2306.05720
Database: arXiv
More Details
Description not available.