Paper
Image Caption
- Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
- Explain Images with Multimodal Recurrent Neural Networks
- Sequence to Sequence Learning with Neural Networks
- Show and Tell: A Neural Image Caption Generator
- Show, Edit and Tell: A Framework for Editing Image Captions
- Large-scale Video Classification with Convolutional Neural Networks
- Better Captioning with Sequence-Level Exploration
- Transform and Tell: Entity-Aware News Image Captioning
- Context-Aware Group Captioning via Self-Attention and Contrastive Features
Back+text2Image
GAN-Based
Image Manipulation
GAN-Based
- ManiGAN: Text-Guided Image Manipulation
- Segmentation-Aware Text-Guided Image Manipulation
- Semantic Image Synthesis via Adversarial Learning
- Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language
Diffusion model-Based
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
- InstructPix2Pix: Learning to Follow Image Editing Instructions
VAE-Based
Text to Image
- Generative Adversarial Text to Image Synthesis
- Learning What and Where to Draw
- StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
- Semi-supervised FusedGAN for Conditional Image Generations
- Controllable Text-to-Image Generation
- TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network
- Cross-Modal Contrastive Learning for Text-to-Image Generation
- DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
- MirrorGAN: Learning Text-to-image Generation by Redescription
- Semantics Disentangling for Text-to-Image Generation
- TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
- TIME: Text and Image Mutual-Translation Adversarial Networks
- Cycle-Consistent Inverse GAN for Text-to-Image Synthesis
- Interactive Image Generation with Natural-Language Feedback
Text to Face
CV
- Stochastic Image-to-Video Synthesis using cINNs
- Towards Faster and Stabilized Gan training for High-fidelity Few-shot Image Synthesis
Others
- Mask R-CNN
- CLIPScore:A Reference-free Evaluation Metric for Image Captioning
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models