Self-Forcing Implementation
A simple implementation of Self-Forcing for autoregressive video generation based on the paper: https://arxiv.org/abs/2506.08009
For standard autoregressive generation (no GT frames):
- Teacher-Forcing: Trains with GT → inference without GT = exposure bias
- Self-Forcing: Trains with generated → inference with generated = no exposure bias ✓
- Diffusion-Forcing: Also handles this case well (boundary=0 during training)
So for standard generation, Self-Forcing and Diffusion-Forcing are similar - both avoid exposure bias.
But here’s where Diffusion-Forcing shines:
- One model, many tasks: The SAME model can do: - Generate video from text (no frames given) - Continue a video (some frames given) - Fill in missing frames (inpainting) - Enhance low-res frames (some noisy frames given)
- Real-world scenarios often DO have some clean frames: - Video editing: “Keep frames 1-10, regenerate 11-20” - Video restoration: “These frames are corrupted, fix them” - Frame interpolation: “I have every 3rd frame, fill the gaps”
- Robustness: Because it trained with random boundaries, it’s more robust to different starting conditions
Example: Imagine you’re building a video AI product. With:
- Teacher/Self-Forcing: Need separate models for generation, inpainting, interpolation
- Diffusion-Forcing: ONE model handles everything
The advantage isn’t just about avoiding exposure bias - it’s about flexibility. You train once and can deploy for many different use cases.
Overview
Self-Forcing addresses the “exposure bias” problem in autoregressive models by training with self-generated context rather than ground-truth frames. This implementation demonstrates the key concepts:
- Autoregressive generation with KV caching - Efficient streaming generation
- Self-generated context during training - Reduces exposure bias
- Few-step diffusion - Balances quality and speed
- Gradient truncation - Manages computational cost
- Holistic sequence loss - Improves temporal coherence
Files
self_forcing.py
- Core implementation with:KVCache
: Rolling key-value cache for streamingSimpleDiffusionModel
: Basic diffusion model with attentionSelfForcingTrainer
: Training logic with self-forcing
example.py
- Demonstration of training and streaming generation
Usage
from self_forcing import SimpleDiffusionModel, SelfForcingTrainer
# Create model
model = SimpleDiffusionModel(input_dim=64, hidden_dim=128)
# Initialize trainer
trainer = SelfForcingTrainer(model, num_diffusion_steps=5)
# Train with self-forcing
trainer.train(dataloader, num_epochs=10)
Key Innovation
Unlike traditional autoregressive training that conditions on ground-truth frames, Self-Forcing conditions each frame on previously self-generated outputs during training, making the model more robust to its own prediction errors during inference.