Despite the common use of visual aids for spatial guidance, the extensive use of visual elements can lead to interface overcrowding and visual overload. In contrast, tactile navigation presents a vision-free alternative in assisting way-finding tasks. However, previous studies kept directional cues constantly active in experiments, overlooking potential differences in user interaction behavior caused by variations in interpretation speed and localization accuracy between visual, tactile, and combined spatial guidance. Additionally, these studies typically simplified navigation tasks, focusing on following cues without the multitasking demands of large-scale virtual environments. This paper aims to fill the gap by investigating the effects of visual, vibrotactile (equidistant and localized magnification), and combined (visual + equidistant and visual + localized magnification) spatial guidance modalities on task performance, user behavior and tendency in large-scale virtual environments.By building an immersive virtual scene that enables continuous tracking of participants' performance and interaction behaviors, results indicated no significant disparities in task completion times and way-finding strategies across modalities. However, tactile modalities were activated more frequently but for a shorter duration compared to visual and combined modalities. Moreover, when both visual and tactile directional cues were presented, users tended to rely on visual cues.