Daily papers related to Image/Video/Multimodal Generation from cs.CV
January 25, 2026
Modern generative video models excel at producing convincing, high-quality outputs, but struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world environments. In this work, we introduce \textbf{AnyView}, a diffusion-based video generation framework for \emph{dynamic view synthesis} with minimal inductive biases or geometric assumptions. We leverage multiple data sources with various levels of supervision, including monocular (2D), multi-view static (3D) and multi-view dynamic (4D) datasets, to train a generalist spatiotemporal implicit representation capable of producing zero-shot novel videos from arbitrary camera locations and trajectories. We evaluate AnyView on standard benchmarks, showing competitive results with the current state of the art, and propose \textbf{AnyViewBench}, a challenging new benchmark tailored towards \emph{extreme} dynamic view synthesis in diverse real-world scenarios. In this more dramatic setting, we find that most baselines drastically degrade in performance, as they require significant overlap between viewpoints, while AnyView maintains the ability to produce realistic, plausible, and spatiotemporally consistent videos when prompted from \emph{any} viewpoint. Results, data, code, and models can be viewed at: https://tri-ml.github.io/AnyView/
TLDR: AnyView is a diffusion-based video generation framework that synthesizes novel views in dynamic scenes using multi-view and spatiotemporal consistency, outperforming existing methods in extreme dynamic view synthesis.
TLDR: AnyView是一个基于扩散的视频生成框架,它使用多视角和时空一致性来合成动态场景中的新视角,在极端的动态视角合成中优于现有方法。
Read Paper (PDF)