TKG-DM: Training-free Chroma Key Content Generation Diffusion Model

Ryugo Morita1,2, Stanislav Frolov2, Brian Bernhard Moser2, Takahiro Shirakawa, Ko Watanabe2,
Andreas Dengel2, Jinjia Zhou1

1 Hosei University, Tokyo, Japan | 2 RPTU Kaiserslautern-Landau & DFKI GmbH, Germany

CVPR 2025 Highlights

Abstract

We introduce TKG-DM, a training-free diffusion model that efficiently generates images with foreground objects placed over a uniform chroma key background. By optimizing initial random noise through a novel "Channel Mean Shift" method, our model precisely controls background colors without requiring any fine-tuning or additional datasets. Experiments show TKG-DM outperforms existing methods in quality, flexibility, and ease-of-use, while also extending naturally to tasks such as consistency models and text-to-video generation.

Overview of TKG-DM

TKG-DM model overview

TKG-DM manipulates initial Gaussian noise $\mathbf{z}_T$ through channel mean shift $F_c$, generating "init color noise" $\mathbf{z}_T^*$. Using a Gaussian mask, we blend this with original noise to guide Stable Diffusion, generating chroma key images aligned with text prompts without fine-tuning.

Key Methodology

Channel Mean Shift

Channel Mean Shift

Adjusting each channel's mean in initial noise allows precise control over background colors, enabling targeted chroma key generation.

Init Noise Selection

A Gaussian mask smoothly blends original noise and shifted noise, effectively separating foreground from background regions.

Experimental Results

Qualitative Results (SDXL)

Qualitative Comparison

Our model consistently generates precise chroma key backgrounds and high-quality foregrounds without relying on prompt engineering or fine-tuning.

Quantitative Results (SDXL)

Method FID ↓ m-FID ↓ CLIP-I ↑ CLIP-S ↑
DeepFloyd (GBP)31.5720.310.7810.270
SDXL (GBP)45.3239.170.7590.272
LayerDiffuse (Fine-tuned)29.3429.820.7780.276
Ours (Training-free)41.8131.430.7630.273

Bold: best results; underline: second-best results.

Application to ControlNet

Qualitative Results (ControlNet)

ControlNet Qualitative Comparison

Comparison of ControlNet methods. TKG-DM generates cleaner foregrounds with uniform chroma key backgrounds compared to existing methods using Green Background Prompt (GBP).

Quantitative Results (ControlNet)

Method FID ↓ m-FID ↓ CLIP-I ↑ CLIP-S ↑
SDXL (GBP) 22.04 18.62 0.819 0.279
Ours (Training-free) 17.09 17.22 0.834 0.284

Bold indicates the best performance.

Applications to Other Tasks

Text-to-Video Generation

Our model successfully extends to text-to-video generation by applying chroma key backgrounds consistently across video frames, enabling easy background replacement and editing workflows.

Consistency Models

TKG-DM seamlessly integrates with Consistency Models, allowing rapid generation of high-quality chroma key images in fewer steps, significantly enhancing efficiency while maintaining output quality.

Citation

@article{morita2024tkg,
    title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
    author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
    journal={arXiv preprint arXiv:2411.15580},
    year={2024}
    }