Cover Image

1st ViSCALE Workshop @ CVPR2025

Test-time Scaling
for Computer Vision

June 11th-15th @ Nashville, US

Introduction

Test-time scaling, which has demonstrated great success in improving reasoning for LLMs (e.g., OpenAI o1/o3, DeepSeek-R1), can hold significant promise for computer vision models as well. By allocating more computational resources during inference, vision models could achieve greater accuracy, robustness, and interpretability in complex tasks ranging from perception and understanding to reasoning and decision-making. This approach could enhance performance in high-stakes domains such as medical imaging, autonomous driving, and security surveillance, where precision and interpretability are crucial. Additionally, extending test-time scaling to multimodal models and generative architectures could foster more sophisticated cross-modal reasoning and higher-quality content generation. However, applying test-time scaling to vision models presents unique challenges. Vision tasks typically involve high-dimensional inputs, making computational scaling at test time more resource-intensive. Efficient algorithms will be necessary to ensure that the increased computation does not lead to impractical processing times or energy consumption. Moreover, ensuring robustness and safety when models are subjected to increased inference computation—particularly in dynamic or adversarial environments—will be crucial.

The Workshop on Test-time Scaling for Computer Vision (ViSCALE) at CVPR2025 aims to explore the frontiers of scaling test-time computation in vision models, addressing both theoretical advancements and practical implementations. We will discuss the suitability of test-time scaling for traditional vision tasks like perception and the extensions to multimodal and generative models, towards enhancing performance in critical domains. It will also cover solutions for efficient algorithms, considerations of robustness and safety, and novel problems in computer vision posed by test-time scaling. By bringing together experts, the workshop seeks to foster collaboration and innovation in applying this paradigm to push the limits of computer vision.

Important Dates

March 15, 2025 (AoE)
Submission Deadline
March 29, 2025 (AoE)
Notification of Acceptance
April 14, 2025 (AoE)
Camera-Ready Submission
TBD
Workshop Date

Call for Papers

We welcome submissions related to different aspects of test-time scaling for computer vision, including but not limited to the following topics:

Information for Submission

Format: We consider 3 types of submissions:

Please follow the CVPR2025 author guidelines and use the official template.

Submission Site: OpenReview

Submission Deadline: March 15, 2025 (AoE)

Best Paper Award: We will select 1~3 best paper awards as well as oral presetation according to the quality of the submissions. There will be a prize upto $1000 for each awarded paper.

Speakers

Speaker 1 Ludwig Schmidt Stanford University
Speaker 2 Saining Xie New York University
Speaker 3 Trevor Darrell U.C., Berkeley

Organizers

Committee Member 0 Hang Su Tsinghua University
Committee Member 1 Yinpeng Dong Tsinghua University
Committee Member 2 Yichi Zhang Tsinghua University
Committee Member 3 Jindong Gu University of Oxford
Committee Member 4 Cihang Xie U.C., Santa Cruz
Committee Member 6 Bolei Zhou U.C., Los Angelos
Committee Member 7 Jun Wang University College London
Committee Member 8 Jun Zhu Tsinghua University
Committee Member 9 Philip Torr University of Oxford
Committee Member 10 Shiguang Shan Chinese Academy of Sciences
Committee Member 11 Wanli Ouyang Shanghai AI Laboratory
Committee Member 12 Shuicheng Yan National University of Singapore

Sponsors

We are sincerely grateful for the supports from all our sponsors.

Sponsor 0
Sponsor 0
Sponsor 0

Contact

For any inquiries, please contact the official email: viscalecvpr@gmail.com or our organizers, Yinpeng Dong: dongyinpeng@mail.tsinghua.edu.cn and Yichi Zhang: zyc22@mails.tsinghua.edu.cn