CVPR2025

Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification

S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali

Abstract

Figure 1. Current generative video evaluation methods struggle with temporal fidelity. NeuS-Vconverts prompts into Temporal Logic specifications and formally verifies them against a video automaton. The upper video aligns with the prompt's temporal sequencing, while the lower video, despite being visually appealing, fails to do so. Unlike VBench, NeuS-V effectively differentiates between them.