Despite extensive research across various modalities, the precise mechanisms of sensory attenuation (SA) remain debated. Forward models suggest that efference copies of motor commands enable the brain to predict and distinguish anticipated changes in self-initiated sensory input. Predictive processing proposes that predictions about upcoming changes in sensory input are not solely based on efference copies, but rather generated in the form of a generative model integrating external, contextual factors, as well. This study investigated underlying mechanisms of SA in the tactile domain, specifically examining self-initiation and temporal predictions within a virtual reality (VR) framework. Participants (N = 33) engaged in an active condition, moving their hands to elicit a virtual touch. Importantly, visual perception was modified in VR, so that participants touched their rendered - but not physical - hands. The virtual touch triggered test vibrations on a touch controller, the intensity of which was then compared to that of a standard stimulus. In the passive condition, vibrations were presented without movement and were preceded by a visual cue. Further, test vibrations appeared either immediately or after a variable onset delay. Our results revealed a significant effect of the factor "onset delay" on perceived vibration intensity. Additionally, we observed interactions between the factors "agency" and "test vibration intensity" and between the factors "agency" and "onset delay", with attenuation effects for immediate vibrations at high intensities only. These findings emphasize the impact of external, contextual factors and support the notion of a broader, attention oriented predictive mechanism for the perception of self-initiated stimuli.