Reinforcement Learning (RL) is a well-known method for learning control of complex and unknown dynamical systems. In this paper, we propose a solution for addressing a major limitation of the existing RL schemes when it comes to interleaving the environment interaction step with the learning step. Leveraging the neural network approximation complexity with the real-time learning capability is one of several reasons for which RL has not been adopted more in practical control systems. Our online learning solution with near real-time capability is piloted by a model-reference tracking control problem where the underlying system state is encoded as a moving window of past output and input signals expanded with the reference model state and with the reference input state. The value function and the controller neural networks are trained online using the rules of backpropagation, based on the interaction experiences with the system. Two case studies, a simulation one and an experimental one involving a real hardware, show that the proposed methodology is valid. We compare learning performance operation times under two popular, high-level software packages with automatic differentiation capabilities, under both synchronous and asynchronous updates. The software challenges are discussed in detail based on code runtime numbers, concluding that for lower order systems with relative fast dynamics and adaptive characteristics, there is a strong incentive to further develop online synchronous RL that are closer to the real-time requirements. While the asynchronous online RL motivates scaling up the learning method to higher dimensional systems with faster dynamics, even in non hard real-time setups.