Deep reinforcement learning has been widely applied to solve the anti-jamming problems in wireless communications, achieving good results. However, most research assumes that the communication system can obtain complete Channel State Information (CSI). Under limited CSI conditions, this paper models the system using Partially Observable Markov Decision Processes (POMDPs). In addition, it is challenging to determine the optimal exploration rate decay factor for decision algorithms using exponential decay exploration rate. This paper proposes an exploration rate decay factor automatic adjustment algorithm. Additionally, a Deep Recurrent Q-Network (DRQN) algorithm architecture suitable for the scenario is designed, along with an intelligent anti-jamming decision algorithm. The algorithm first uses Long Short-Term Memory (LSTM) networks to learn the temporal features of input data, flattens the features, and then feeds the result into fully connected layers to get the intelligent anti-jamming strategy. Simulation results demonstrate that the exploration rate decay factor automatic adjustment algorithm can achieve nearly optimal performance when set with a large initial exploration rate decay factor. Under periodic jamming and intelligent blocking jamming, the proposed algorithm reduces the number of time slots required for convergence by 45% and 32% compared to the performance-optimal Double DQN (DDQN) algorithm in comparison algorithms. The normalized throughput after convergence is slightly higher, and the convergence performance is significantly better than that of Deep Q-Network (DQN) and Q-Learning (QL).