TKG-DM: Training-free Chroma Key Content Generation Diffusion Model

Ryugo Morita^1,2, Stanislav Frolov², Brian Bernhard Moser², Takahiro Shirakawa, Ko Watanabe²,
Andreas Dengel², Jinjia Zhou¹

¹ Hosei University, Tokyo, Japan | ² RPTU Kaiserslautern-Landau & DFKI GmbH, Germany

CVPR 2025 Highlights

[Paper] [Code]

Abstract

We introduce TKG-DM, a training-free diffusion model that efficiently generates images with foreground objects placed over a uniform chroma key background. By optimizing initial random noise through a novel "Channel Mean Shift" method, our model precisely controls background colors without requiring any fine-tuning or additional datasets. Experiments show TKG-DM outperforms existing methods in quality, flexibility, and ease-of-use, while also extending naturally to tasks such as consistency models and text-to-video generation.

Overview of TKG-DM

TKG-DM manipulates initial Gaussian noise $\mathbf{z}_T$ through channel mean shift $F_c$, generating "init color noise" $\mathbf{z}_T^*$. Using a Gaussian mask, we blend this with original noise to guide Stable Diffusion, generating chroma key images aligned with text prompts without fine-tuning.

Key Methodology

Channel Mean Shift

Adjusting each channel's mean in initial noise allows precise control over background colors, enabling targeted chroma key generation.

Init Noise Selection

A Gaussian mask smoothly blends original noise and shifted noise, effectively separating foreground from background regions.

Experimental Results

Qualitative Results (SDXL)

Our model consistently generates precise chroma key backgrounds and high-quality foregrounds without relying on prompt engineering or fine-tuning.

Quantitative Results (SDXL)

Method	FID ↓	m-FID ↓	CLIP-I ↑	CLIP-S ↑
DeepFloyd (GBP)	31.57	20.31	0.781	0.270
SDXL (GBP)	45.32	39.17	0.759	0.272
LayerDiffuse (Fine-tuned)	29.34	29.82	0.778	0.276
Ours (Training-free)	41.81	31.43	0.763	0.273

Bold: best results; underline: second-best results.

Application to ControlNet

Qualitative Results (ControlNet)

Comparison of ControlNet methods. TKG-DM generates cleaner foregrounds with uniform chroma key backgrounds compared to existing methods using Green Background Prompt (GBP).

Quantitative Results (ControlNet)

Method	FID ↓	m-FID ↓	CLIP-I ↑	CLIP-S ↑
SDXL (GBP)	22.04	18.62	0.819	0.279
Ours (Training-free)	17.09	17.22	0.834	0.284

Bold indicates the best performance.

Applications to Other Tasks

Text-to-Video Generation

Our model successfully extends to text-to-video generation by applying chroma key backgrounds consistently across video frames, enabling easy background replacement and editing workflows.

Consistency Models

TKG-DM seamlessly integrates with Consistency Models, allowing rapid generation of high-quality chroma key images in fewer steps, significantly enhancing efficiency while maintaining output quality.

Citation

@inproceedings{morita2025tkg,
            title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
            author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
            booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
            pages={13031--13040},
            year={2025}
          }