Perception and action are inherently entangled: our world view is shaped by how we explore our environment through complex and variable self-motion. Even when fixating stable stimuli, our eyes undergo small, involuntary movements. Fixational eye movements (FEM) render a stable world jittery on our retinae, which can be expected to harm neural coding. Yet, empirical evidence suggests that FEM help rather than harm human perception of fine detail. Here, we elucidate this paradox by uncovering under which conditions FEM improve or impair retinal coding and human acuity. We combine theory and experiment: model accuracy is directly compared to that of healthy human subjects in a visual acuity task. Acuity is modeled by applying an ideal Bayesian classifier to simulations of retinal spiking activity in the presence of FEM. In addition, empirical FEM are monitored using high-resolution eye-tracking by an adaptive optics scanning laser ophthalmoscope. FEM introduce variability in retinal ganglion cell activity, but they also effectively preprocess inputs to facilitate retinal information encoding. Based on an interplay of these mechanisms, our model predicts a relation between visual acuity, FEM amplitude, and single-trial stimulus size that quantitatively accounts for experimental observations and captures the beneficial effect of FEM. Moreover, we observe that while human subjects' FEM statistics vary with stimulus size, our model suggests that subjects' FEM amplitude remains within a near-optimal range, where acuity is enhanced compared to much larger or smaller amplitudes. Overall, our findings indicate that perception benefits from action even at the fine spatiotemporal scale of FEM.