1 Hosei University, Tokyo, Japan | 2 RPTU Kaiserslautern-Landau & DFKI GmbH, Germany
CVPR 2025 Highlights
We introduce TKG-DM, a training-free diffusion model that efficiently generates images with foreground objects placed over a uniform chroma key background. By optimizing initial random noise through a novel "Channel Mean Shift" method, our model precisely controls background colors without requiring any fine-tuning or additional datasets. Experiments show TKG-DM outperforms existing methods in quality, flexibility, and ease-of-use, while also extending naturally to tasks such as consistency models and text-to-video generation.
TKG-DM manipulates initial Gaussian noise $\mathbf{z}_T$ through channel mean shift $F_c$, generating "init color noise" $\mathbf{z}_T^*$. Using a Gaussian mask, we blend this with original noise to guide Stable Diffusion, generating chroma key images aligned with text prompts without fine-tuning.
Adjusting each channel's mean in initial noise allows precise control over background colors, enabling targeted chroma key generation.
A Gaussian mask smoothly blends original noise and shifted noise, effectively separating foreground from background regions.
Our model consistently generates precise chroma key backgrounds and high-quality foregrounds without relying on prompt engineering or fine-tuning.
Method | FID ↓ | m-FID ↓ | CLIP-I ↑ | CLIP-S ↑ |
---|---|---|---|---|
DeepFloyd (GBP) | 31.57 | 20.31 | 0.781 | 0.270 |
SDXL (GBP) | 45.32 | 39.17 | 0.759 | 0.272 |
LayerDiffuse (Fine-tuned) | 29.34 | 29.82 | 0.778 | 0.276 |
Ours (Training-free) | 41.81 | 31.43 | 0.763 | 0.273 |
Bold: best results; underline: second-best results.
Comparison of ControlNet methods. TKG-DM generates cleaner foregrounds with uniform chroma key backgrounds compared to existing methods using Green Background Prompt (GBP).
Method | FID ↓ | m-FID ↓ | CLIP-I ↑ | CLIP-S ↑ |
---|---|---|---|---|
SDXL (GBP) | 22.04 | 18.62 | 0.819 | 0.279 |
Ours (Training-free) | 17.09 | 17.22 | 0.834 | 0.284 |
Bold indicates the best performance.
Our model successfully extends to text-to-video generation by applying chroma key backgrounds consistently across video frames, enabling easy background replacement and editing workflows.
TKG-DM seamlessly integrates with Consistency Models, allowing rapid generation of high-quality chroma key images in fewer steps, significantly enhancing efficiency while maintaining output quality.
@article{morita2024tkg,
title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
journal={arXiv preprint arXiv:2411.15580},
year={2024}
}