Awesome Diffusion Transformers

Title Date Venue Task Resource
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model31 Aug 2022TPAMI'2024Others Website Code
All are Worth Words: A ViT Backbone for Diffusion Models25 Sep 2022CVPR'2023Image Code
Learning to Learn with Generative Models of Neural Network Checkpoints26 Sep 2022arXivOthers Website Code
Scalable Diffusion Models with Transformers19 Dec 2022ICCV'2023Image Website Code
Exploring Vision Transformers as Diffusion Learners28 Dec 2022arXivImage
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer07 Mar 2023ICCV'2023Others Website Code
Masked Diffusion Transformer is a Strong Image Synthesizer25 Mar 2023ICCV'2023Image Code
Diffusion Transformer for Adaptive Text-to-Speech03 May 2023Interspeech'2023Speech Website
VDT: General-purpose Video Diffusion Transformers via Mask Modeling22 May 2023ICLR'2024Video Website Code
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer22 May 2023EMNLP'2023Speech Website
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech22 May 2023arXivSpeech Website Code
Fast Training of Diffusion Models with Masked Transformers15 Jun 2023TMLRImage Code
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation04 Jul 2023NeurIPS'20233D Website Code
Large-Vocabulary 3D Diffusion Model with Transformer14 Sep 2023ICLR'20243D Website Code
Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models15 Sep 2023arXivImage Website Code
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis30 Sep 2023ICLR'2024Image Website Code
Dolfin: Diffusion Layout Transformers without Autoencoder25 Oct 2023arXivOthers
Mapache: Masked parallel transformer for advanced speech editing and synthesis03 Dec 2023ICASSP'2024Speech
DiffiT: Diffusion Vision Transformers for Image Generation04 Dec 2023arXivImage Code
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation07 Dec 2023CVPR'2024Image Video Website
Photorealistic Video Generation with Diffusion Models11 Dec 2023arXivVideo Website
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers11 Dec 2023arXivOthers
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation12 Dec 2023arXiv3D Website
NViST: In the Wild New View Synthesis from a Single Image with Transformers13 Dec 2023arXivOthers Website
TransDDPM: Transformer-Based Denoising Diffusion Probabilistic Model for Image Restoration28 Dec 2023PRCV'2023Image
Latte: Latent Diffusion Transformer for Video Generation05 Jan 2024arXivVideo Website Code
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models10 Jan 2024arXivImage Website Code
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers16 Jan 2024arXivImage Website Code
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers21 Jan 2024arXivImage Website Code
Cross-view Masked Diffusion Transformers for Person Image Synthesis02 Feb 2024arXivImage
DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation05 Feb 2024arXivOthers
Sora15 Feb 2024OpenAIImage Video Website
SDiT: Spiking Diffusion Model with Transformer18 Feb 2024arXivImage
FiT: Flexible Vision Transformer for Diffusion Model19 Feb 2024arXivImage Code
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis22 Feb 2024arXivVideo Website
OpenDiT26 Feb 2024GitHubImage Video Code
FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes28 Feb 2024arXivImage Website Code
Open-Sora-Plan01 Mar 2024GitHubVideo Website Code
Stable Diffusion 3: Research Paper05 Mar 2024Stability AIImage Website