Results
Results on GUI tasks. The red background represents that the data source is in the training set of the corresponding model, while the green background represents that the test dataset is OOD for the model. Bold highlights the best results in the OOD setting, and underlined are the second-best.
Results on spatial affordance prediction. These results demonstrate that NaviMaster’s fine-
grained visual–spatial alignment significantly en-
hances performance in both object-level and free-
space referring.
Results on embodied navigation. Since we are the first
to train an agent model capable of generalizing in
the VLMNav, there are no prior navigation mod-
els trained under VLMNav for direct comparison. NaviMaster achieves the
highest Success Rate and SPL, representing a substantial improvement
over the base model.