V-JEPA 2 is an advanced world model enabling state-of-the-art visual understanding and prediction in the physical world, as well as zero-shot planning and robot control in new environments.
The model is based on the Joint Embedding Predictive Architecture (JEPA) and includes a 1.2 billion-parameter setup, building on previous iterations to improve action prediction and world modeling.
V-JEPA 2 is trained using self-supervised learning from over 1 million hours of video and incorporates knowledge from robot data for specific action planning.
Three new benchmarks—IntPhys 2, Minimal Video Pairs (MVPBench), and CausalVQA—have been introduced to evaluate models' physical reasoning capabilities.
The model achieves high success rates in robot tasks like object picking and placing in new environments, providing foundational insights for future AI development.
Get notified when new stories are published for "🇺🇸 Hacker News English"