Introduction

Test-time scaling, which has shown remarkable success in improving reasoning for large language models, holds significant promise for computer vision and multimodal systems. By allocating additional computation during inference, vision models can enhance accuracy, robustness, and interpretability in complex reasoning tasks. Recent advances in the "thinking with images" paradigm, where models perform visual chain-of-thought reasoning through iterative perception and synthesis, suggest a shift toward visually grounded cognition rather than purely symbolic inference. Extending test-time scaling to this setting could enable adaptive visual reasoning, where models selectively focus computation on ambiguous or conceptually rich regions. Coupled with emerging trends such as multimodal reflection, self-evaluation, and scalable visual generation, this approach paves the way for more general, controllable, and interpretable vision reasoning systems. However, scaling inference on high-dimensional visual inputs remains computationally expensive, efficient allocation of resources is still an open problem, and ensuring robustness, safety, and energy efficiency under expanded test-time computation poses significant challenges.

The 2nd Workshop on Test-time Scaling for Computer Vision (ViSCALE) aims to explore the frontiers of scaling test-time computation in vision models, addressing both theoretical advancements and practical implementations. We will discuss the suitability of test-time scaling for traditional vision tasks like perception and the extensions to multimodal and generative models, towards enhancing performance in critical domains. It will also cover solutions for efficient algorithms, considerations of robustness and safety, and novel problems in computer vision posed by test-time scaling. By bringing together experts, the workshop seeks to foster collaboration and innovation in applying this paradigm to push the limits of computer vision.

Call for Papers

We invite submissions of original research papers, work-in-progress papers, and extended abstracts. Topics of interest include but are not limited to:

Theoretical analysis of test-time scaling in computer vision
Test-time scaling for high-level visual reasoning (e.g., spatial reasoning, planning)
Extensions to Multimodal Foundation Models and World Models
Efficient algorithms for inference-time scaling
Trustworthiness, robustness, and safety in scaled vision models
Benchmarks and evaluations for test-time scaling techniques

Submission Guidelines:

All submissions will be handled via OpenReview. The review process is double-blind. All papers must be formatted using the CVPR 2026 Author Kit. We welcome different types of submissions to the workshop, including:

Track 1: Full Papers (Archival)

Length: Up to 8 pages (excluding references)
Description: Accepted papers will be published in the CVPR 2026 Workshop Proceedings and IEEE Xplore.

Track 2: Extended Abstracts (Non-Archival)

Length: Up to 4 pages (excluding references)
Description: Best for preliminary work or works already published elsewhere. These will NOT appear in the proceedings.
Instructions: For submissions to the this track, the Appendix will not be reviewed. The submission must be limited to 4 pages of content. Include the label "EXTENDED ABSTRACT" / "SHORT" at the start of the title to distinguish it from regular submissions. If this label is not included, the submission will be treated as a regular one.

We strongly encourage authors to carefully follow the CVPR Author Guidelines, since our workshop will adhere to the same formatting and submission policies as the main conference.

Submit Your Paper

Important Deadlines

February 10, 2026

Submission Begins
March 10, 2026 (AOE)

Submission Deadline
March 18, 2026

Author Notification and Meta-Data Collection
March 31, 2026

Camera-Ready Submission
June 3 / 4, 2026

Workshop Day

Schedule

TBD

Keynote Speakers

Manling Li Northwestern University

Ranjay Krishna University of Washington

Sergey Levine UC Berkeley

Mahmoud Assran Meta AI

Haoqi Fan TikTok

Ziwei Liu Nanyang Technological University

Organizers

Yinpeng Dong Tsinghua University

Yichi Zhang Tsinghua University

Yu Huang Tsinghua University

Shilong Liu Princeton University

Xueyan Zou Tsinghua University

Cihang Xie U.C., Santa Cruz

Dan Zhang Bosch

Miao Liu Tsinghua University

Jindong Gu University of Oxford

Lingjuan Lyu Sony

Hang Su Tsinghua University

Jun Zhu Tsinghua University

Shiguang Shan Chinese Academy of Sciences

Shuicheng Yan National University of Singapore

Past Editions

Explore previous workshops and their contributions to the field

1st Edition

Completed

2025

ViSCALE Workshop

Test-time Scaling for Computer Vision

Date June 12, 2025

Location Nashville, TN, USA

Conference CVPR 2025

Explore Workshop

Contact

For any inquiries, feel free to reach out to us via email at: viscalecvpr@gmail.com. or You may also contact the organizers directly: Yinpeng Dong, Yichi Zhang.

2nd ViSCALE: Test-Time Scaling for Computer Vision

CVPR2026 @ Denver, Colorado