Voice is a critical tool for communication, and diagnosing voice disorders poses significant challenges, particularly when using high-speed video (HSV) endoscopy. The primary difficulty with HSV lies in the need for clinical experts to manually analyze and interpret large volumes of HSV frames. To address these challenges, kymography has been introduced as an effective clinical decision-support tool. In this study, a deep learning-based approach for classifying kymographic images is proposed to automate the analysis by training models to detect subtle and intricate variations in pathological vibratory patterns. We used high-speed recordings from the Benchmark for Automatic Glottis Segmentation (BAGLS) dataset to generate kymographic images, which were then used for binary and tertiary classifications employing deep learning models. We evaluated the performance of five widely used pretrained models: AlexNet, DenseNet121, Xception, Inceptionv3, and ResNet50v2. Our experimental results demonstrate that DenseNet121 can automatically classify voice disorders with higher accuracy and better performance across different model evaluation indicators, outperforming existing methods. With further research, the deep learning classifier has the potential to become a valuable diagnostic assistance tool for clinicians.