Enhancing Spatial Reasoning in Vision-Language Models via Monocular Depth Estimation: A Comparative Study on SpatialBench

Publication Type : Journal Article

Publisher : IEEE

Source : 2025 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI)

Url : https://doi.org/10.1109/cvmi66673.2025.11337662

Campus : Coimbatore

School : School of Physical Sciences

Department : Mathematics

Year : 2025

Abstract : Vision-Language Model (VLM) spatial relationship understanding is an asset of VLMs when used in real-world tasks, e.g., robotic grasping and self-driving navigation. Existing VLMs trained only on RGB images are marred by a lack of spatial relationship reasoning due to the lack of depth perception. In this paper, we overcome this limitation by incorporating Monocular Depth Estimation (MDE) in fine-tuning VLMs. We employ three state-of-the-art MDE models-ZoeDepth, Depth Anything V2, and DepthPro-to generate depth maps of a large variety of images from spatially from SpatialQA. The depth-enhanced images are utilized to fine-tune the Mini-InternVL-l.5 model, a lite VLM with 2 billion parameters. The spatial reasoning abilities of the base and fine-tuned models are compared in terms of the SpatialBench benchmark, by varying the depth estimation models to study which yields better spatial reasoning. We see that fine-tuning using depth information significantly enhances spatial awareness, particularly in counting, object existence, and reachability tasks. Of the three MDE models, ZoeDepth consistently yields the best performance gains. These findings highlight the importance of incorporating depth cues in the training pipelines of VLMs to unlock their full potential in spatial relationship tasks.

Cite this Research Publication : Karthik Prasad G, Murali Krishna Panthangi, Enhancing Spatial Reasoning in Vision-Language Models via Monocular Depth Estimation: A Comparative Study on SpatialBench, 2025 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), IEEE, 2025, https://doi.org/10.1109/cvmi66673.2025.11337662

About Amrita Vishwa Vidyapeetham

Rankings

Accreditation

Governance

Chancellor

Leadership

Press Media

Newsletters

Amritapuri
Campus

Amaravati
Campus

Bengaluru
Campus

Chennai
Campus

Coimbatore
Campus

Faridabad
Campus

Kochi
Campus

Mysuru
Campus

Nagercoil
Campus

Haridwar

Research

Centers

Patents

Publication