When the current demand shock is observable, with a high discount factor, Q-learning agents predominantly learn to implement symmetric rigid pricing, i.e., they charge constant prices across demand states. Under this pricing pattern, supra-competitive profits can still be obtained and are sustained through collusive strategies that effectively punish deviations. This shows that Q-learning agents can successfully overcome the stronger incentives to deviate during the positive demand shocks, and consequently algorithmic collusion persists under observed demand shocks. In contrast, with a medium discount factor, Q-learning agents learn that maintaining high prices during the positive demand shocks is not incentive compatible and instead proactively charge lower prices to decrease the temptation for deviating, while maintaining relatively high prices during the negative demand shocks. As a result, the countercyclical pricing pattern becomes predominant, aligning with the theoretical prediction of Rotemberg and Saloner (1986). These findings highlight how Q-learning algorithms can both adapt pricing strategies and develop tacit collusion in response to complex market conditions.